The toolkit from afrendeiro

Typo in Example https://ngs-toolkit.readthedocs.io/en/latest/examples.html

In the example the variable n_top for the parameter max_diff is not defined (default would be 1000).

# # perform enrichment analysis on differnetial region sets 
# # using LOLA, MEME-AME, HOMER and Enrichr 
analysis.differential_enrichment(     
directional=True,     
max_diff=**n_top**,     
sort_var="pvalue")

rpy2 version not being enforced at installation

rpy2 version is not being enforced upon installation.
This causes discrepancies in ngs_toolkit.general.deseq_analysis and potentially other places too.

Implement HOMER motif analysis through de novo discovery, reduction and overrepresentation

From @cschmidl:

HOMER reducing motifs:

1. motif search as usual

2. Create a combined motif file (example using 3 input files) => homerMotifs.all.motifs is the file with the results

cat
/Volumes/NGSdata/projects/project/HOMER/motifs/differentialExpression.Down_G1_vs_G2.txt/homerMotifs.all.motifs
/Volumes/NGSdata/projects/project/HOMER/motifs/differentialExpression.Down_G1_vs_G3.txt/homerMotifs.all.motifs
/Volumes/NGSdata/projects/project/HOMER/motifs/differentialExpression.Down_G2_vs_G3.txt/homerMotifs.all.motifs
/Volumes/NGSdata/projects/project/HOMER/motifs/differentialExpression.Up_G1_vs_G2.txt/homerMotifs.all.motifs
/Volumes/NGSdata/projects/project/HOMER/motifs/differentialExpression.Up_G1_vs_G3.txt/homerMotifs.all.motifs
/Volumes/NGSdata/projects/project/HOMER/motifs/differentialExpression.Up_G2_vs_G3.txt/homerMotifs.all.motifs \

/Volumes/NGSdata/projects/project/HOMER/motifs/project_homerMotifs.all.motifs

3. compareMotifs.pl [options]

compareMotifs.pl /Volumes/NGSdata/projects/project/HOMER/motifs/project_homerMotifs.all.motifs /Volumes/NGSdata/projects/project/HOMER/motifs/project_opening_reduced -info 0.6 -nofacts -pvalue 1e-25 -known /Users/christianschmidl/src/homer/data/knownTFs/vertebrates/known.motifs

4. make a known motif file (not 100% sure why I egreped this, maybe have a quick look at the .motif file)

for i in $(ls /Volumes/NGSdata/projects/project/HOMER/motifs/project_all_reduced/homerResults/*.motif | egrep -v similar | egrep -v RV )
do
cat $i >> /Volumes/NGSdata/projects/project/HOMER/motifs/project_all_reduced/all.motifs
done

5. check for enrichment of defined known motifs ==> usual motif search but specifying the selfmade known motif file as the motif source

PROJECTDIR=/Volumes/NGSdata/projects/project
HOMERDIR=${PROJECTDIR}/HOMER
GENOME="mm10"
MOTIFFILE="/Volumes/NGSdata/projects/project/HOMER/motifs/project_all_reduced/all.motifs"
OPTIONS="-nomotif "

for SAMPLE in $(ls ${HOMERDIR}/peaks | egrep differentialExpression.)
do
echo "findMotifsGenome.pl ${HOMERDIR}/peaks/$SAMPLE $GENOME ${HOMERDIR}/motifs/reducedMotifEnrichment_$(basename "${SAMPLE}" .txt) $OPTIONS -mknown $MOTIFFILE"
done

6. extract from all motif searches the p-values and motif names, make a dataframe with 1st column motifs, and in all other columns the p-values and fold-change input vs. background from each motif search for visulaization. Heatmap is nice, but also bubble plots I’ve seen where the size of the bubble is p-value, while color is fold change or vice versa. No idea how to plot this though... in general I think a good approach to crosscompare several samples!

Enrichr step of run_enrichment_jobs missing time

toolkit/ngs_toolkit/general.py

Lines 1920 to 1941 in ffe682e

    
           # Enrichr 
        
           if "enrichr" in steps: 
        
               files = glob(results_dir + "/*/*.gene_symbols.txt") 
        
               for file in files: 
        
                   dir_ = os.path.dirname(file) 
        
                   name = os.path.basename(dir_) 
        
                   output_ = file.replace(".gene_symbols.txt", ".enrichr.csv") 
        
                   if os.path.exists(get_this_file_or_timestamped(output_)) and (not overwrite): 
        
                       continue 
        
                   jobs.append( 
        
                       [ 
        
                           name + "_enrichr", 
        
                           os.path.join(dir_, name + ".enrichr.log"), 
        
                           os.path.join(dir_, name + ".enrichr.sh"), 
        
                           ("shortq", 1, 4000), 
        
                           "{e} -m ngs_toolkit.recipes.enrichr {f} {o}".format( 
        
                               e=sys.executable, f=file, o=output_ 
        
                           ), 
        
                       ] 
        
                   ) 
        
           for jobname, log_file, job_file, (partition, cores, mem, time), task in jobs:

Leads to an unpacking error (see last line)

add CNV data generation for CNV module testing

Separate ngs_toolkit.general into more specific modules

e.g. utils, ngs, etc

project_to_geo bad file referencing after copying to submission dir

Handle sample and group attributes between functions better

There are errors when using ngs_toolkit.Analysis.get_sample_colours within ngs_toolkit.general.unsupervised if group attributes are not given in the same order as existing in the various levels of the relevant matrix Multiindex columns or if all attributes are blank (NaN).

Reported by @cschmidl

PCA testing + plotting + enrichment of significant feature weights

Standardize use of "static/external" files in analysis

Paths for external files should be used as given as opposite to the assumption they are in data/external.

divvy 0.4.0 not supported

Reported by @sreichl:
divvy 0.4.0 not supported since divvy.logging is no longer present.

TODO: fix divvy of logging in a backwards compatible manner

Package name

	pro	con
ngs_toolkit	explicit	long; underscode/hyphen; polluted namespace
seeq	unique; small	already on PyPI; ambiguous
...

markdown readme doesn't render on pypi

See here:

https://pypi.org/project/ngs-toolkit/

you can solve it like this:

https://github.com/pepkit/peppy/blob/35da352ed44b1ee7551aa1c89d36111ac44b8ec3/setup.py#L53-L58

example: https://pypi.org/project/peppy/

pybedtools - bedtool cannot turn into dataframe if NAs present in intersect result

toolkit/ngs_toolkit/atacseq.py

Lines 1294 to 1298 in 2e27787

    
           annot = bed.intersect( 
        
               states, wa=True, wb=True, f=frac, loj=True 
        
           ) 
        
           try: 
        
               annot = annot.to_dataframe(usecols=[0, 1, 2, 6])

e.g.

bed
chr1 9844 10460
chr1 180534 181797

states (lift-over from hg38, found in projects/pad-map/data/external/E032...etc.)
chr1 10000 10800 9_Het
chr1 10800 16000 15_Quies
chr1 16000 16200 1_TssA
chr1 16200 19000 5_TxWk
chr1 19000 96080 15_Quies
chr1 96276 96476 15_Quies
chr1 97276 177200 15_Quies

annot
chr1 9844 10460 chr1 10000 10800 9_Het
chr1 180534 181797 . -1 -1 .
chr1 585847 586452 chr1 586020 770220 15_Quies
chr1 629689 630568 chr1 586020 770220 15_Quies

Quick andre fix from earlier:
annot.to_dataframe(disable_auto_names=True, header=None, usecols=['a', 'b', 'c', 'g'], names=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

R dataframe conversion bugs

toolkit/ngs_toolkit/general.py

Lines 690 to 692 in d579d5c

    
           df = pd.DataFrame(np.asarray(r_df)).T 
        
           df.columns = [str(x) for x in r_df.colnames] 
        
           df.index = [str(x) for x in r_df.rownames]

With my version of R (etc.), I had to change to the following to work

def r2pandas_df(r_df):
import numpy as np
df = pd.DataFrame(np.asarray(r_df)) # No transposition
df.columns = [s for s in r_df.columns.tolist()]
df.index = [s for s in r_df.index.tolist()]
return df

Perhaps this entire function is depreciated? I didn't really find a need to convert R data frame to Pandas DF

Recipes

Premade recipies of common analysis ready to run.

Usage: projectmanager run --recipe="atac-seq_analysis" metadata/project_config.yaml

Add own exceptions

Exception to axis.flatten causing error if len(attributes_to_plot)==1

toolkit/ngs_toolkit/general.py

Line 445 in f9cc2f6

axis = axis.flatten()

If len(attributes_to_plot) == 1
axis has no attribute 'flatten'

This is a problem because the default attributes_to_plot=['cell_type'] has length of one.

full cross-system independence (including job support)

Bug in fisher test (differential overlap)

First sorry I am not quoting exactly the lines in ngs_toolkit... github loading of the page is perpetually stuck in 'fetching commit history' for me for some reason and i can't quote the lines directly.

Before I forget -
Calculation of column "d" is wrong
Currently d = total regions (i.e. matrix_raw.shape[0]) - (size1 + size2 + intersect)

Should actually be -
d = total regions - (a + b + c) (exclusively group1, exclusively group2, and intersect)
or alternatively:
d = total regions - (size1-intersect) - (size2-intersect) - intersect
===> hence d = total regions - size1 - size2 +intersect

Found the error by getting negative values for column d in my comparison, and I suspect this came from an order of operations issue

Redirect atac_analysis.unsupervised to general.unsupervised

plot_peak_characteristics error: Initializing dataframe with iterator

toolkit/ngs_toolkit/atacseq.py

Line 1280 in 37bb92d

    
           lengths = pd.melt(pd.DataFrame(sample_peak_lengths, index=[s.name for s in samples]).T, value_name='peak_length', var_name="sample_name").dropna()

Here - corrected version should have put sample_peak_lengths into a list.

lengths = pd.melt(pd.DataFrame([sample_peak_lengths],
                               index=[s.name for s in samples]).T,
                  value_name='peak_length',
                  var_name="sample_name").dropna()

Have functions from Analysis objects declaring their output

This would be extremely useful for docs but load_data could also take this to try to load up any existing analysis output.

general module's collect_enrichment function not skipping homer properly

If homer was not run, homer is not skipped properly by the collect_enrichment function in the general module (line 1350)

Thanks :)

simplification of too complex functions e.g. unsupervised_analysis by partictioning into smaller functions in utils, graphics

compartmentalization/redistribution of small functions in general to utils, graphics

all high-level functions under Analysis

Does not replace hyphen ('-'), needs regex=True for Dataframe.replace()

toolkit/ngs_toolkit/general.py

Line 714 in d579d5c

experiment_matrix = experiment_matrix.replace("-", "_")

instead:
experiment_matrix = experiment_matrix.replace("-", "", regex=True)
comparison_table = comparison_table.replace("-", "", regex=True)

edit:
for some reason underscore '_' not showing above, only add regex etc. part.

Compare quantile normalization implementations and default to a native Python implementation

Allow custom header in dataframes loaded by `load_data`

Related to #9, particular case:
Allow customization of number of header lines to read in for dataframes with MultiIndex as columns (or index). This is hardcoded now in load_data in ATACSeqAnalysis.

TrackDb.txt sample group indentation

TrackDb.txt sample group indentation error leads to
needLargeMem: trying to allocate 0 bytes

cross-package args/kwarg consistency

Might be a bug: actually want to reduce to .unique() values

toolkit/ngs_toolkit/general.py

Line 985 in 37bb92d

groups = xx.index.get_level_values(attr)

I realized that there were duplicate lines being generated, and think that perhaps it might need unique() at the end of the groups, otherwise many redundancies in associations dataframe

At least for me, this is why the code is taking forever to run.

Incorrect inference of multi-index in the case of non-str index

toolkit/ngs_toolkit/utils.py

Lines 554 to 580 in 1dfa4d0

    
           def fix_dataframe_header(df, force_dtypes=float): 
        
               # First try to see whether it is not a MultiIndex after all 
        
               if df.index.isna().any().sum() == 1: 
        
                   df.columns = df.loc[df.index.isna(), :].squeeze() 
        
                   df.columns.name = None 
        
                   df.index.name = None 
        
                   return df.loc[~df.index.isna()] 
        
               # If so, get potential categorical columns and prepare MultiIndex with them 
        
               cols = list() 
        
               for i in df.index[:300]: 
        
                   try: 
        
                       df.loc[i, :].astype(float) 
        
                   except ValueError: 
        
                       cols.append(i) 
        
               if len(cols) == 0: 
        
                   pass 
        
               elif len(cols) == 1: 
        
                   df.columns = df.loc[cols[0]] 
        
                   df = df.loc[~df.index.isin(cols)] 
        
               else: 
        
                   df.columns = pd.MultiIndex.from_arrays(df.loc[cols].values, names=cols) 
        
                   df = df.loc[~df.index.isin(cols)] 
        
               df.index.name = None 
        
               if force_dtypes is not None: 
        
                   return df.astype(force_dtypes) 
        
               else: 
        
                   return df

Context:
Upon analysis.load_data() of matrix_norm, inference of multi-index levels excludes non-str levels e.g. boolean entries (in my case, toggle/exclude), and numeric factors (e.g. cell_count), and regardless of whether numeric attributes were designated in annotate_samples.

See below for what I mean by the bug:
In [41]: analysis.matrix_norm.index
Out[41]:
Index(['exclude', 'toggle', 'cell_count', 'chr1:9843-10840',
'chr1:15983-16492', 'chr1:180528-181802', 'chr1:191092-192113',
'chr1:267741-268289', 'chr1:585847-586457', 'chr1:605341-605844',
...
'chrY:56825538-56826039', 'chrY:56836469-56837142',
'chrY:56841935-56842436', 'chrY:56843501-56844002',
'chrY:56845972-56846473', 'chrY:56849062-56849563',
'chrY:56855527-56856028', 'chrY:56870612-56871192',
'chrY:56873351-56874110', 'chrY:56879337-56879838'],
dtype='object', length=211111)

general.plot_differential fails getting groups when comparisons made are all-vs-all

Change mkprj program into projectmanager

New proposed syntax:
projectmanager new <project_name>
projectmanager run [<project_name>, <project_config>] <-- runs looper run?

analysis.collect_coverage - unsorted indexes when fast_and_unsafe = True

toolkit/ngs_toolkit/atacseq.py

Lines 856 to 859 in 1dfa4d0

    
           if fast_and_unsafe: 
        
               _LOGGER.warning("Using a concatenation method that is not 100% safe.") 
        
               matrix_raw = pd.concat(matrix_raw, axis=1, sort=False) 
        
               matrix_raw = matrix_raw.loc[:, ~matrix_raw.columns.duplicated()].set_index(["chrom", "start", "end"])

For consensus coverage bed for each sample, the index order of matrix_raw is not in natural sort order (the sample-specific coverage bed under distributed does not have regions in natural sort order too).
See below ---------------------------

top n of sample-specific bed

In [26]: bed = pd.read_csv(f'{analysis.data_dir}/{analysis.samples[1].name}/coverage/{an
...: alysis.samples[1].name}.peak_set_coverage.bed', sep='\t', header=None)

In [27]: bed.head()
Out[27]:
0 1 2 3
0 chr1 234880931 234882653 16
1 chr1 243268961 243270032 2
2 chr1 39845476 39846665 96
3 chr1 61865588 61866129 0
4 chr1 77594208 77594852 2

top n of matrix_raw in distributed=True vs. distributed = False or fast_and_unsafe = False

In [20]: df = analysis.measure_coverage(samples=s[0:2], assign=False, save=False)

In [21]: df.head()
Out[21]:
ATAC_KB_PID-1377_PBMC ATAC_KB_PID-1377_Tcell
chr1:9843-10840 18 26
chr1:15983-16492 3 1
chr1:180528-181802 18 18
chr1:191092-192113 8 5
chr1:267741-268289 8 4

In [22]: df = analysis.collect_coverage(samples=s[0:2], assign=False, save=False)
100%|█████████████████████████████████████████████████████| 2/2 [00:00<00:00, 18.73it/s]

In [23]: df.head()
Out[23]:
variable ATAC_KB_PID-1377_PBMC ATAC_KB_PID-1377_Tcell
region
chr1:9843-10840 18 26
chr1:15983-16492 3 1
chr1:180528-181802 18 18
chr1:191092-192113 8 5
chr1:267741-268289 8 4

matrix_raw if distributed and fast_and_unsafe are True

In [24]: df = analysis.collect_coverage(samples=s[0:2], assign=False, save=False, fast_a
...: nd_unsafe=True)
100%|█████████████████████████████████████████████████████| 2/2 [00:00<00:00, 17.64it/s]
ngs_toolkit:atacseq:L857 (collect_coverage) [WARNING] > Using a concatenation method that is not 100% safe.

In [25]: df.head()
Out[25]:
ATAC_KB_PID-1377_PBMC ATAC_KB_PID-1377_Tcell
region
chr1:234880931-234882653 12 16
chr1:243268961-243270032 3 2
chr1:39845476-39846665 24 96
chr1:61865588-61866129 1 0
chr1:77594208-77594852 5 2

Inconsisent shape of region_annotation vs. region_annotation_b

toolkit/ngs_toolkit/atacseq.py

Line 72 in 1dfa4d0

>>> a.get_peak_genomic_location()

Running analysis.get_peak_genomic_location() leads to inconsistent number of regions between background and real peak set.

Dataset: projects/pad-map (feel free to use)

See:

In [15]: [f'{i}: shape {getattr(analysis, i).shape}' for i in analysis.dict if isinstance(getattr(analysis, i), pd.core.fram
...: e.DataFrame)]
Out[15]:
['comparison_table: shape (425, 5)',
'gene_annotation: shape (211108, 6)',
'closest_tss_distances: shape (211468, 6)',
'region_annotation: shape (211105, 4)', <-----
'region_annotation_b: shape (211104, 4)', <----
'region_annotation_mapping: shape (395590, 4)',
'region_annotation_b_mapping: shape (339561, 4)',
'chrom_state_annotation: shape (211108, 4)',
'chrom_state_annotation_b: shape (211108, 4)',
'chrom_state_annotation_mapping: shape (250895, 4)',
'chrom_state_annotation_b_mapping: shape (223218, 4)']

Bug in structure of for loop iterating through gene_set_librar(ies)

toolkit/ngs_toolkit/analysis.py

Lines 6582 to 6584 in 67f25c1

    
           # pivot table 
        
           if ("correlation" not in plot_types) and ("heatmap" not in plot_types): 
        
               return

Structure currently as follows -
For gene_set_library in gene_set_libraries:
plot barplot
plot scatterplot
if extra steps (i.e. heatmap, correlation) not in steps:
return & don't continue to pivot

Hence, if heatmap and/or correlation are not in steps, not all barplots/scatterplots are plotted for each gene_set_library, only the very first one. This is difficult because currently I want a different top_n for barplot than the rest. Simple fix is to make it an positive, rather than a negation at if extra steps... and etc.

Allow initialization of subprojects when using `Analysis.from_pep()`

Reported by @sreichl: Allow initialization of subprojects when using Analysis.from_pep()

TODO: Allow kwargs to be passed to from_pep and then to Project when initializing an Analysis object.

Quick-fix: extra (inconsequential) comma

toolkit/ngs_toolkit/atacseq.py

Line 1289 in 1dfa4d0

("background", "chrom_state_annotation_b", background),

full Python 2 and 3 support + testing

Implement GWAS enrichment analysis

typo in attr name (organism)

toolkit/ngs_toolkit/atacseq.py

Line 1846 in 1dfa4d0

if hasattr(self, "organim"):

ddd

Accommodate different lola output file formats (like tsv)

toolkit/ngs_toolkit/general.py

Line 1406 in a02ca3e

    
           lola = pd.read_csv(os.path.join(comparison_dir, "allEnrichments.txt"), sep="\t")

Thanks :)

Motif vs. Meme

toolkit/ngs_toolkit/analysis.py

Lines 5593 to 5613 in eccb6fc

    
                       ) 
        
               def collect_differential_enrichment( 
        
                   self, 
        
                   steps=["region", "lola", "motif", "homer", "homer_consensus", "enrichr"], 
        
                   directional=True, 
        
                   permissive=True, 
        
                   output_dir="{results_dir}/differential_analysis_{data_type}/enrichments", 
        
                   input_prefix="differential_analysis", 
        
                   output_prefix="differential_analysis", 
        
                   differential=None, 
        
               ): 
        
                   """ 
        
                   Collect the results of enrichment analysis ran after a differential analysis. 
        
                   Parameters 
        
                   ---------- 
        
                   steps : :obj:`list`, optional 
        
                       Steps of the enrichment analysis to collect results for. 
        
                       Defaults to ["region", "lola", "meme", "homer", "enrichr"].

Is there a reason that differential enrichment recognizes "meme" and collect differential enrichment recognizes "motif" as one of the steps? the two are also somewhat used interchangeably in collect differential enrichment

ATACAnalysis feature request - make chrom_state_file / build more explicit

It could be really nice if the statement of which regulatory build was more explicit in chrom_state_file.
For example - this could mean:
results/analysis.chrom_state_annotation.csv --> results/analysis.E002.chrom_state_annotation.csv
attribute E002_chrom_state_annotation for analysis object

Pro's:

more immediately apparent in results directory
-allows for multiple chromatin states to be analyzed for mixed samples (e.g. PBMCs)

Con's:

too specialized?

numpy version might be too low

toolkit/requirements/requirements.txt

Line 1 in b0d0681

numpy>=1.14.0

Hi Andre,
With numpy version=1.15.0, while running ATACSeqAnalysis('blah').get_peak_gene_annotation(tss='different blah'), I encountered:
ModuleNotFoundError - No module named 'numpy.core._multiarray_umath'

A bit of googling and tab completion revealed that version 1.15 of numpy does not have this module (obviously) and this was fixed by upgrading to 1.16.

Thought I'd let you know (I checked the dev version and didn't see a change in the requirements).
Cheers,
F

	# Enrichr
	if "enrichr" in steps:
	files = glob(results_dir + "//.gene_symbols.txt")
	for file in files:
	dir_ = os.path.dirname(file)
	name = os.path.basename(dir_)
	output_ = file.replace(".gene_symbols.txt", ".enrichr.csv")
	if os.path.exists(get_this_file_or_timestamped(output_)) and (not overwrite):
	continue
	jobs.append(
	[
	name + "_enrichr",
	os.path.join(dir_, name + ".enrichr.log"),
	os.path.join(dir_, name + ".enrichr.sh"),
	("shortq", 1, 4000),
	"{e} -m ngs_toolkit.recipes.enrichr {f} {o}".format(
	e=sys.executable, f=file, o=output_
	),
	]
	)

	for jobname, log_file, job_file, (partition, cores, mem, time), task in jobs:

	annot = bed.intersect(
	states, wa=True, wb=True, f=frac, loj=True
	)
	try:
	annot = annot.to_dataframe(usecols=[0, 1, 2, 6])

	df = pd.DataFrame(np.asarray(r_df)).T
	df.columns = [str(x) for x in r_df.colnames]
	df.index = [str(x) for x in r_df.rownames]

	def fix_dataframe_header(df, force_dtypes=float):
	# First try to see whether it is not a MultiIndex after all
	if df.index.isna().any().sum() == 1:
	df.columns = df.loc[df.index.isna(), :].squeeze()
	df.columns.name = None
	df.index.name = None
	return df.loc[~df.index.isna()]
	# If so, get potential categorical columns and prepare MultiIndex with them
	cols = list()
	for i in df.index[:300]:
	try:
	df.loc[i, :].astype(float)
	except ValueError:
	cols.append(i)
	if len(cols) == 0:
	pass
	elif len(cols) == 1:
	df.columns = df.loc[cols[0]]
	df = df.loc[~df.index.isin(cols)]
	else:
	df.columns = pd.MultiIndex.from_arrays(df.loc[cols].values, names=cols)
	df = df.loc[~df.index.isin(cols)]
	df.index.name = None
	if force_dtypes is not None:
	return df.astype(force_dtypes)
	else:
	return df

	if fast_and_unsafe:
	_LOGGER.warning("Using a concatenation method that is not 100% safe.")
	matrix_raw = pd.concat(matrix_raw, axis=1, sort=False)
	matrix_raw = matrix_raw.loc[:, ~matrix_raw.columns.duplicated()].set_index(["chrom", "start", "end"])

	# pivot table
	if ("correlation" not in plot_types) and ("heatmap" not in plot_types):
	return

	)

	def collect_differential_enrichment(
	self,
	steps=["region", "lola", "motif", "homer", "homer_consensus", "enrichr"],
	directional=True,
	permissive=True,
	output_dir="{results_dir}/differential_analysis_{data_type}/enrichments",
	input_prefix="differential_analysis",
	output_prefix="differential_analysis",
	differential=None,
	):
	"""
	Collect the results of enrichment analysis ran after a differential analysis.

	Parameters
	----------
	steps : :obj:`list`, optional
	Steps of the enrichment analysis to collect results for.

	Defaults to ["region", "lola", "meme", "homer", "enrichr"].

afrendeiro / toolkit Goto Github PK

toolkit's Introduction

ngs-toolkit

toolkit's People

Contributors

Stargazers

Watchers

Forkers

toolkit's Issues

HOMER reducing motifs:

1. motif search as usual

2. Create a combined motif file (example using 3 input files) => homerMotifs.all.motifs is the file with the results

3. compareMotifs.pl [options]

4. make a known motif file (not 100% sure why I egreped this, maybe have a quick look at the .motif file)

5. check for enrichment of defined known motifs ==> usual motif search but specifying the selfmade known motif file as the motif source

top n of sample-specific bed

top n of matrix_raw in distributed=True vs. distributed = False or fast_and_unsafe = False

matrix_raw if distributed and fast_and_unsafe are True

Recommend Projects

Recommend Topics

Recommend Org