nboley / idr Goto Github PK

View Code? Open in Web Editor NEW

158.0 158.0 44.0 35.15 MB

IDR

License: GNU General Public License v2.0

Python 88.81% C 11.19%

idr's People

Contributors

Stargazers

Watchers

Forkers

imk1 vd4mmind wwang-chcn lpantano simon-coetzee superbobry lzamparo sambuckberry lone-suslik jchenpku sklasfeld leepc12 kundajelab amirshams84 charlie-george andysaurin mantaspanos xiangyupan zorrodong lzlgboy acgtcoder xjyx longzhao1992 rhysnewell pollytikhonova leeyang516 nsamzhao ayannawade alexkent-qzy j-sp4 jwschroeder3 xma82 jz-jiang ua0 sequana narzouni jieqianghe chanderkantchaudhary renzhonglu tdw-89

idr's Issues

idr.py line 45

The code, as it is, does not work with bed files. Changing line 45 of idr.py should do the trick. Currently, that line says "data[5]". I believe it should be "data[3]", as you want the strand information.

what the columns when the inputs are bed files?

Hi,
I'm hoping I can get some answers even though this github looks pretty deserted.

I ran the idr using input files that looks like this:

track type=narrowPeak name="peaks1.bed" description="peaks1.bed" nextItemButton=on
chr1	12237	15856	peaks1.bed_narrowPeak1	367	.	0.00000	0.00000	0.00000	905
chr1	16017	16585	peaks1.bed_narrowPeak2	90	.	0.00000	0.00000	0.00000	425

The fifth column is -log10(pvalue) of peaks.

I ran idr using the following command:

idr --samples peaks1.bed peaks2.bed \
	--input-file-type bed \
	--rank 5 \
	--output-file peaks1_peaks2-idr  \
	--plot \
	--log-output-file peaks1_peaks2.idr.log

The output I got is this:

chr3	122863912	122880767	.	1000	.	5.00	5.00	122863912	122880767	58149.00000	122864004	122880714	30237.00000
chr18	10456599	10467214	.	1000	.	5.00	5.00	10456599	10467214	49558.00000	10456626	10467173	20877.00000

What are the columns? This output does not have the same number of columns as the narrowpeak output file.

score does not correspond with IDR

Hi,

I have run IDR and looking at my output, trying to understand why I have peaks that are IDR < 0.05 ( based on column 11) yet the corresponding score in column 5 does not match based on what is described in the manual:

peaks with an IDR of 0.05 have a score of int(-125log2(0.05)) = 540

Is it global IDR or local IDR that is used here? Additionally, in my output column 12 is the beginning of the coordinates for the first replicate. Am I missing something here?

Thanks,
Meeta
B_pa_vs_C_pa-idr.txt

Can I use IDR to analyze ChIPseq or Cuttag data of histones?

Hi,
I'm confusing about how to get the reproducible peaks from replicates of histone data. The IDR is recommended in ENCODE pipeline for TF ChIPseq data. How about histone data? Can I continue to use IDR to do the same task?
I'm looking forward your reply.
Thanks~
Hanwen

idr error with narrowPeaks

Hi,
I run idr with the test narrowPeaks files and my own narrowPeaks files. They all ended up with an error:

[wszheng@guru idr-2.0.2]$ idr --samples tests/data/peak1 tests/data/peak2
WARNING: Cython does not appear to be installed.- falling back to much slower python method.
Initial parameter values: [0.10 1.00 0.20 0.50]
/home/wszheng/Install/Python-3.6.1/bin/idr --samples tests/data/peak1 tests/data/peak2
Traceback (most recent call last):
File "/home/wszheng/Install/Python-3.6.1/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.2', 'idr')
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/pkg_resources/init.py", line 654, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/pkg_resources/init.py", line 1434, in run_script
exec(code, namespace, namespace)
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr.egg-info/scripts/idr", line 10, in
idr.idr.main()
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/idr.py", line 759, in main
fix_mu=args.fix_mu, fix_sigma=args.fix_sigma )
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/idr.py", line 391, in fit_model_and_calc_idr
fix_mu=fix_mu, fix_sigma=fix_sigma)
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/optimization.py", line 468, in estimate_model_params
fix_mu=fix_mu, fix_sigma=fix_sigma)
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/optimization.py", line 420, in EMP_with_pseudo_value_algorithm
z1 = compute_pseudo_values(r1, theta[0], theta[1], theta[3])
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/utility.py", line 46, in py_compute_pseudo_values
-10, 10, EPS ) )
TypeError: py_cdf_i() takes 6 positional arguments but 7 were given

Can anyone help me figure out this problem? Thanks very much.

Log ouput

Add option to log to file and unify the logging code (avoid prints in sub-modules)

Issue with Clip-seq data

I have downloaded clip_seq data narrow Peak files from ENCODE and am trying to perform IDR directly on them.

I call the command:

idr --samples rep1.peak rep2.peak -i 0.05 --rank p.value --input-file-type narrowPeak

I first get the warning:

'FutureWarning: comparison to None will result in an elementwise object comparison in the future.
if localIDRs == None or IDRs == None:'

Then immediately get the error:

' mean(x.summit-x.start for x in m_pk.pks[key+1])
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int''

I have tested these same commands on traditional chip-seq data and on the example data provided by IDR. Does anyone have any suggestions?

Add both IDR plots

--rank option with arg 'p.value' raises a Value Error

When I call idr thusly: idr --verbose --samples $1 $2 --input-file-type narrowPeak --rank p.value -o $outdir/$outfile 2>$outdir/idr-errors.txt

IDR raises a Value Error about the column I'm using to rank my peaks:

Loading the peak files
Traceback (most recent call last):
File "/Users/zamparol/anaconda/envs/py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-macosx-10.6-x86_64.egg/idr/idr.py", line 717, in load_samples
signal_index = int(args.rank) - 1
ValueError: invalid literal for int() with base 10: 'p.value'

I'm trying to rank by p.value of narrowPeak files. The usage string (and other issues in this repo) suggest that --rank p.value is the proper way to indicate ranking by P value. Any ideas why I'm seeing this?

My version:

(py3) mski1743:day4 zamparol$ idr --version
IDR 2.0.3
(py3) mski1743:day4 zamparol$ python --version
Python 3.5.2 :: Continuum Analytics, Inc.

installation error IDR

I am facing error while installing IDR
command used: python3 setup.py install

Error:
Traceback (most recent call last):
File "setup.py", line 2, in
import numpy
ImportError: No module named 'numpy'

I have also installed numpy and its showing
"Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages/numpy-1.11.1-py2.7-linux-x86_64.egg"

Still same error. Any idea how to solve it ?

Add links to relevant literature in README.md?

Would be nice to have. Is there an accompanying paper?

Weird-looking plots

When I run idr on peak lists produced by MACS on my data, I get some bizarre-looking plots:

The plots look like fractal versions of the typical IDR plots shown here: http://ccg.vital-it.ch/var/sib_april15/cases/landt12/idr.html

Do you have any idea what might be causing this? I'm invoking the script as:

idr --samples sample1.narrowPeak sample2.narrowPeak \
    --peak-list oraclepeaks.narrowPeak --input-file-type narrowPeak \
    --output-file idrValues.txt --output-file-type narrowPeak \
    --log-output-file idr.log --plot --random-seed 1986

I can share my peak files if you want them.

Version mismatch between code and documentation (README.md)

README.md says

Get the current repo
wget https://github.com/nboley/idr/archive/2.0.2.zip
[...]
Command Line Arguments
[...]
--use-best-multisummit-IDR
                        Set the IDR value for a group of multi summit peaks (a group of peaks with the same chr/start/stop but different summits) to the best value across all of these peaks. This \
is a work around for peak callers that don't do a good job splitting scores across multi summit peaks (e.g. MACS). If set in conjunction with --plot two plots will be created - one with alternate summits and one without.  Use this option with care.

This option is not present in v. 2.0.2.
2.0.2 has bugs as well. Why not point to 2.0.3 or simply ask to use git clone?
I wasted 2h+ on trying to run 2.0.2 which was crashing. See #44

Cannot run IDR with more than 2 replicates

Hi,

I am trying to run IDR on a set of 3 replicates and I get the following error:
idr: error: unrecognized arguments: K9AcY4_2/MACS2_K9acY4_2/K9acY4_2_macs2_peaks.broadPeak

This is the third replicate. If I run it for 2 replicates things are working, but not for more than 2.
Any advice on the issue?

An error when call idr in command line

Hi,
I've successfully installed idr in my linux server.
When I call idr from command line in the idr-2.0.2/ path, it's fine, but when call idr in other path, I got an error:

Traceback (most recent call last):
File "/home/wszheng/Install/Python-3.6.1/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.2', 'idr')
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/pkg_resources/init.py", line 654, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/pkg_resources/init.py", line 1434, in run_script
exec(code, namespace, namespace)
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/idr-2.0.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 8, in
import idr.idr
ModuleNotFoundError: No module named 'idr.idr'

Does anyone know what's the cause of this problem?

idr excludes top peaks

Dear IDR developers,
Could you please help me with this issue. IDR pipeline gives me these pictures and I see that highly reproducible top peaks have high IDR p values. Do you know why?

Thank you!

augment IDR oracle peaks with peaks that have the same boundaries

For multi-summit peaks, add an option to set the IDR value to the maximum value across all summits.

No module named 'idr.idr'

Hi,

I'm pretty new to ChIP-Seq analysis and have been trying to install idr on my MAC. I downloaded the idr package along with python3.6.1 |Anaconda 4.4.0. However when I tried 'idr -h' command I received this error:

Traceback (most recent call last):
File "/Users/benflynn_1/anaconda3/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/Users/benflynn_1/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/init.py", line 744, in run_script
File "/Users/benflynn_1/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/init.py", line 1499, in run_script
File "/Users/benflynn_1/anaconda3/lib/python3.6/site-packages/idr-2.0.3-py3.6-macosx-10.7-x86_64.egg/EGG-INFO/scripts/idr", line 8, in
import idr.idr
ModuleNotFoundError: No module named 'idr.idr'

I have tried re-installing idr but am still receiving the same error. Does anyone know what has gone wrong and how to fix this as I can't find a solution with simple googling?

Cheers,

Ben

IDR with small number of peaks

Hi,

I'm working with a relatively small number of peaks (~100; which is probably in contrast to most use cases, which will have thousands+), due to a small genome size and relatively specific binding behavior. Being not not familiar with the details of the statistical background model (though i get the overall point), due i have to consider anything special in the interpretation of my results?

Can you recommend any literature for the statistical background?

Cheers,
Ben

ValueError when running test peak files

Hi, I'm running idr-2.0.2 in a virtual environment installed with python3.6 and I'm getting the following error running the test command (I also got this error on a real dataset).

Could this be a specific issue with running in a virtual environment?

Thanks,
-Dave

(IDR) [oliverd@aphrodite bin]$ idr --samples ../tests/data/peak1 ../tests/data/peak1
Initial parameter values: [0.10 1.00 0.20 0.50]
Final parameter values: [0.05 0.20 0.99 0.99]
/athena/hssgenomics/scratch/programs/IDR/bin/idr --samples ../tests/data/peak1 ../tests/data/peak1
Traceback (most recent call last):
  File "/athena/hssgenomics/scratch/programs/IDR/bin/idr", line 4, in <module>
    __import__('pkg_resources').run_script('idr==2.0.2', 'idr')
  File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/pkg_resources/__init__.py", line 739, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1500, in run_script
    exec(code, namespace, namespace)
  File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/idr-2.0.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
    idr.idr.main()
  File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/idr-2.0.2-py3.6-linux-x86_64.egg/idr/idr.py", line 774, in main
    useBackwardsCompatibleOutput=args.use_old_output_format)
  File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/idr-2.0.2-py3.6-linux-x86_64.egg/idr/idr.py", line 415, in write_results_to_file
    if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

bug found?

I was running idr on rep1 and rep2 and found out some of the output for rep2 do not match the rep2 narrowPeak file.
I'm copying one of the offending line below. As you can see, the rep2 end location and summit info are not correct. While the rep1 info are correct. This makes me worry about the results. I'm hoping someone can respond. Thanks!
idr file:
chr7 29413398 29416918 . 1000 . -1 213.82509 -1 2529 2.46 2.73 29413398 29416918 620.03809 2548 29413672 29416512 213.82509 702

and here is the associated peak in rep2 narrowPeak
chr7 29413672 29414923 rep2 2138 . 20.03013 213.82509 209.88286 676

FYI, the associated peak in rep1 narrowPeak is correct.
chr7 29413398 29416918 rep1 6200 . 25.59149 620.03809 615.40747 2548

--use-best-multisummit-IDR not recognized in idr-2.0.2

Hi,
I am trying to run IDR with the --use-best-multisummit-IDR flag. When I do so, I get an error that the --use-best-multisummit-IDR argument is not recognized. The help function of idr-2.0.2 does not contain the --use-best-multisummit-IDR flag, while it is mentioned in the README.md here (in github). Can you please let me know if you have removed or replaced the flag with some other option?
In total there are 3 options that do not show up in the help function while they are still mentioned in the README.md (--dont-filter-peaks-below-noise-mean, --use-best-multisummit-IDR and --allow-negative-scores).
Thank you.
/Klev

old plots

The old plots (eg num of sig. peaks against IDR) were useful to see whether certain conditions (antibodies) give more reproducible peaks in replicates than others. Would be helpful if they can be included again.

How uninstall on a MAC

Hi,

Does anyone know how to uninstall idr on a MAC OS X?

Cheers,

Ben

Fails with ValueError on narrow peaks

My files are simple MACS2 narrowPeak. idr fails on these. There is an empty "idrValues.txt" file created

$ idr --samples s1.narrowPeak s2.narrowPeak 
Initial parameter values: [0.10 1.00 0.20 0.50]
Final parameter values: [1.99 1.40 0.91 0.57]
idr --samples s1.narrowPeak s2.narrowPeak 

Traceback (most recent call last):
  File "idr", line 10, in <module>
    idr.idr.main()
  File "idr.py", line 774, in main
    useBackwardsCompatibleOutput=args.use_old_output_format)
  File "idr.py", line 415, in write_results_to_file
    if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


$ python3 --version
Python 3.7.2+

--use-nonoverlapping-peaks argument never used

It seems that the argument --use-nonoverlapping-peaks is never used. Instead, the code checks for an argument "--use_nonoverlapping_peaks". When I corrected this and ran it on two replicates where there are non-overlapping peaks, I got the following error:

Traceback (most recent call last):
  File "/cluster/zeng/code/research/software/miniconda/envs/idr/bin/idr", line 4, in <module>
    __import__('pkg_resources').run_script('idr==2.0.3', 'idr')
  File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 750, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1527, in run_script
    exec(code, namespace, namespace)
  File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
    idr.idr.main()
  File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 876, in main
    useBackwardsCompatibleOutput=args.use_old_output_format)
  File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 482, in write_results_to_file
    merged_peak, IDR, localIDR, output_file_type, signal_type)
  File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 343, in build_idr_output_line_with_bed6
    rv.append( "%i" % min(x.start for x in m_pk.pks[key]))
ValueError: min() arg is an empty sequence

Maybe this functionality hasn't been tested thoroughly due to the mismatched naming? Could you advise?

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Hello! After I install the idr-2.0.2 and run idr --samples tests/data/peak1 tests/data/peak2, the error comes:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
How could I fix this? Wish your reply, thanks.

error: Use a.any() or a.all()_

Hi
When I ran ATAC-Seq data with IDR and encountered below error for so many times. so why it came?
code:
idr --samples sample1_peaks.narrowPeak sample2_peaks.narrowPeak
--input-file-type narrowPeak
--rank p.value
--output-file $workdir/IDR/rep_idr
--plot
--log-output-file $workdir/IDR/rep.idr.log

error:
File "/mypathtodir/site-packages/idr-2.0.2-py3.8-linux-x86_64.egg/idr/idr.py", line 415, in write_results_to_file
if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Thanks !

Best wishes
Zoe

Parameter requirements for input file in BED format?

Hi,

I want to use IDR to get highly repetitive peaks. For each biological repetition, we called peaks separately using SICER to get broad peaks. The output file of the software is similar to the BED file.
So, we run IDR with "--input-file-type bed". However, it failed to run successfully and the following error message is reported.

Traceback (most recent call last): File "/home/yuandy/anaconda3/bin/idr", line 4, in <module> __import__('pkg_resources').run_script('idr==2.0.3', 'idr') File "/home/yuandy/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 651, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/yuandy/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1448, in run_script exec(code, namespace, namespace) File "/home/yuandy/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module> idr.idr.main() File "/home/yuandy/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 839, in main merged_peaks, signal_type = load_samples(args) File "/home/yuandy/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 720, in load_samples raise ValueError("For bed files --signal-type must either "\ ValueError: For bed files --signal-type must either be set to score or an index specifying the column to use.

How should we set parameters when running with a BED file, and are there any other requirements for the BED file itself?
Looking forward to your reply!Thanks!

IDR on gappedPeak format file

For certain histone modifications, ENCODE uses the gappedPeak representation. Right now IDR doesn't support using gappedPeak files - is there a workaround for this? For example, converting a gappedPeak file to narrowPeak file? Or running IDR on the broadPeak file and selecting features in the gappedPeak file which intersect with it?

idr for more than two samples

Does the program support idr calculation considering more that two samples?
When I enter three files for the --samples argument, I get the following error message:
idr: error: unrecognized arguments: '_file name associated with a third sample_'

Thanks,

Tal

idr --samples peak1 peak2 : Number of peaks passing IDR cutoff of 0.05 - 0/50537 (0.0%)

Hi,

I am just running the manual code: idr --samples peak1 peak2 and this is the output:

Initial parameter values: [0.10 1.00 0.20 0.50]
Final parameter values: [1.57 1.26 0.89 0.41]
Number of reported peaks - 50537/50537 (100.0%)

Number of peaks passing IDR cutoff of 0.05 - 0/50537 (0.0%)

I found the same issue using my own peaks. Which can be the problem?
Thank you in advance,
Andrés

Order of Peak1 vs. Peak2 after --samples

Thanks for this great tool.

We are finding that the value of IDR changes based on the order of peak1 and peak2 in the --samples argument. For example, "--samples peak1 peak2" may give an IDR of 3.538 while "--samples peak2 peak1" gives an IDR of 3.544. Is this expected behavior, and may I ask why it occurs?

Many thanks.

Error running with bed file

Hi!
I want to use IDR on peaks file obtained using HMMRATAC pipeline.
From this pipeline, we obtained a list of peaks in gappedPeak format (score in the 13th column) with the highest the score the better the call.
How can I use IDR to find reproducibility between replicates?
How can I change the comuns the IDR uses to find the score, considering the highest the score the better the call?

The files are as follows:
chr1 630114 630158 HighCoveragePeak_0 . . 0 0 255,0,0 1 44 0 -1 -1 -1

I runned the following command:
idr --verbose --samples MKN74R2_sorted.bed MKN74R3_sorted.bed --input-file-type bed --rank 13 --output-file MKN74-idr --plot --log-output-file MKN74.idr.log

And obtained the following error:
/home/cjose/Software/anaconda3/bin/idr --verbose --samples MKN74R2_sorted.bed MKN74R3_sorted.bed --input-file-type bed --rank 13 --output-file MKN74-idr --plot --log-output-file MKN74.idr.log
Traceback (most recent call last):
File "/home/cjose/Software/anaconda3/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 665, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 1463, in run_script
exec(code, namespace, namespace)
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in
idr.idr.main()
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 840, in main
merged_peaks, signal_type = load_samples(args)
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 732, in load_samples
f1, f2 = [load_bed(fp, signal_index) for fp in args.samples]
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 732, in
f1, f2 = [load_bed(fp, signal_index) for fp in args.samples]
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 56, in load_bed
raise ValueError("Invalid Signal Value: {:e}".format(signal))
ValueError: Invalid Signal Value: -1.000000e+00

Thank you in advance for your help.
Cheers,
Celina

Using --rank option no output

When --rank is specified (p.value, signal.value or q.value), no peaks are called. If it is not specified, signal.value is used but for MACS2 called peaks p.value needs to be used.

IDR output peak boundaries are different from input

I've been facing an issue with a subset of my ChIP-seq data. The idr output, in the "old output format" seems to change the peak boundaries of the input files. Here's an example -

idr --samples peaks1.narrowPeak peaks2.narrowPeak --use-old-output-format -o test-overlapped-peaks.txt

One of the peaks output in test-overlapped-peaks.txt is chr1 84274822 84276079 :

chr1    84274822    84276079    .   130.52803   84274772    84275909    .   108.03337   0.00000 0.00000 .

But when I run through my original input (from peaks1.narrowPeak), I found that the actual peak coordinates were - chr1 84274822 84275322

Out of 4477 peaks in my input files, I found that a total of 219 peak boundaries have a wrong end point, like the peak shown above. Note that peak merging is off in the command that I ran.

Curiously, when I looked at the input replicate which had this peak (peaks1.narrowPeak), here is what I saw -

chr1    84274822    84275322    peak_2017   2392    .   46.13288    243.62022   239.28380   265
chr1    84275490    84276079    peak_2018   4054    .   84.39515    411.44415   405.43530   215

Note that the peak_2017 chr1:84274822-84275322 is present as chr1:84274822-84276079 in the IDR output. 84276079 is the peak end point of the next peak i.e. peak_2018.

Also, I found that the IDR values reported in the output shown above is 0. Is there any way you could increase the number of significant places displayed? Perhaps displaying the -log10 values instead?

I tried runnning the idr command with --input-file-type narrowPeak, but I get the same output.

Add plotting code

There should be an option to plot the results.

Error while using --use-nonoverlapping-peaks option

Hello,

Trying to use idr tool with this command:
/share/apps/idr/5fbd010/bin/idr
--samples
BRD2_minus_JQ1_1.MACS2_peaks.narrowPeak.IDR_ready.temp.Peak BRD2_minus_JQ1_2.MACS2_peaks.narrowPeak.IDR_ready.temp.Peak
--input-file-type narrowPeak
--rank p.value
--output-file test.merged.idr.bed
--soft-idr-threshold 0.05
--plot
--use-nonoverlapping-peaks
--peak-merge-method min
--verbose

Getting the following error:
Please see attached file:
TKXL.test.IDR.e293603.txt

It works fine without any errors if I drop the --use-nonoverlapping-peaks option.

ValueError: could not convert string to float

Hello,
I been getting different errors when running IDR.

If I run Idr v2.0.2 I get the following error:
./idr --input-file-type bed --rank 7 --plot --verbose --samples peaks_TGATCA_clean.bed peaks_PrNotT_clean.bed
Loading the peak files
Merging peaks
Ranking peaks
Initial parameter values: [0.10 1.00 0.20 0.50]
Fitting the model parameters
Iter 0 1.57e+00 4.57e-01 [0.09023906 0.71172818 0.98999476 0.98043616]
Iter 1 2.36e-01 3.29e-01 [0.1093229 0.49997562 0.98999476 0.9755989 ]
Iter 2 1.56e-01 2.29e-01 [0.1093229 0.34967449 0.98999476 0.97019152]
Iter 3 1.12e-01 1.59e-01 [0.1093229 0.24417437 0.98999476 0.96406317]
Iter 4 4.48e-02 6.87e-02 [0.1093229 0.20000658 0.98999476 0.96466309]
Iter 5 2.89e-03 3.72e-03 [0.1093229 0.20000658 0.98999476 0.9675544 ]
Iter 6 1.44e-03 1.86e-03 [0.1093229 0.20000658 0.98999476 0.96899706]
Iter 7 7.46e-04 9.61e-04 [0.1093229 0.20000658 0.98999476 0.96974351]
Iter 8 3.85e-04 4.96e-04 [0.1093229 0.20000658 0.98999476 0.9701287 ]
Iter 9 2.01e-04 2.58e-04 [0.1093229 0.20000658 0.98999476 0.97032952]
Iter 10 1.06e-04 1.36e-04 [0.1093229 0.20000658 0.98999476 0.97043521]
Iter 11 5.60e-05 7.20e-05 [0.1093229 0.20000658 0.98999476 0.97049116]
Iter 12 2.97e-05 3.82e-05 [0.1093229 0.20000658 0.98999476 0.97052088]
Iter 13 1.58e-05 2.04e-05 [0.1093229 0.20000658 0.98999476 0.9705367 ]
Iter 14 8.42e-06 1.08e-05 [0.1093229 0.20000658 0.98999476 0.97054513]
Iter 15 4.49e-06 5.78e-06 [0.1093229 0.20000658 0.98999476 0.97054962]
Iter 16 2.39e-06 3.08e-06 [0.1093229 0.20000658 0.98999476 0.97055201]
Iter 17 0.00e+00 0.00e+00 [0.1093229 0.20000658 0.98999476 0.97055201]
Finished running IDR on the datasets
Final parameter values: [0.11 0.20 0.99 0.97]
Writing results to file
./idr --input-file-type bed --rank 7 --plot --verbose --samples peaks_TGATCA_clean.bed peaks_PrNotT_clean.bed
Traceback (most recent call last):
File "./idr", line 10, in
idr.idr.main()
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/idr/idr.py", line 774, in main
useBackwardsCompatibleOutput=args.use_old_output_format)
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/idr/idr.py", line 415, in write_results_to_file
if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

If I run Idr v2.0.3 I get:
./idr --input-file-type bed --rank 7 --plot --verbose --samples ENCFF437QMO.bed ENCFF025CWO.bed
Traceback (most recent call last):
File "./idr", line 10, in
idr.idr.main()
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.3-py3.7-linux-x86_64.egg/idr/idr.py", line 828, in main
merged_peaks, signal_type = load_samples(args)
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.3-py3.7-linux-x86_64.egg/idr/idr.py", line 726, in load_samples
f1, f2 = [load_bed(fp, signal_index) for fp in args.samples]
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.3-py3.7-linux-x86_64.egg/idr/idr.py", line 726, in
f1, f2 = [load_bed(fp, signal_index) for fp in args.samples]
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.3-py3.7-linux-x86_64.egg/idr/idr.py", line 64, in load_bed
float(data[6]), float(data[7]), float(data[8])
ValueError: could not convert string to float: 'chr1_minus_88541_827795'

My bed files were generated using grit (following scrip from encode rampage protocol) and are organized as follow:

chr1 629361 629492 TSS_chr1_minus_13397_827795_pk1 1000 - 15.0 chr1_minus_13397_827795 chr1_minus_13397_827795 TSS_chr1_minus_13397_827795_pk1 1.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0$
chr1 629580 629610 TSS_chr1_minus_13397_827795_pk2 1000 - 17.0 chr1_minus_13397_827795 chr1_minus_13397_827795 TSS_chr1_minus_13397_827795_pk2 2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0$
chr1 629639 629939 TSS_chr1_plus_629080_634923_pk1 1000 + 1646.0 chr1_plus_629080_634923 chr1_plus_629080_634923 TSS_chr1_plus_629080_634923_pk1 270.0,0.0,0.0,1.0,79.0,6.0,4.0,2.0,3.0,23.0,8.0,97.0,0.$
chr1 629752 629867 TSS_chr1_minus_13397_827795_pk3 1000 - 37.0 chr1_minus_13397_827795 chr1_minus_13397_827795 TSS_chr1_minus_13397_827795_pk3 2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,1.0$
chr1 629968 630009 TSS_chr1_minus_13397_827795_pk4 1000 - 97.0 chr1_minus_13397_827795 chr1_minus_13397_827795 TSS_chr1_minus_13397_827795_pk4 41.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0$

I am using Python 3.7.6

How do I work around this?

keep peaknames

An option to keep not only the coordinates but also the peaknames would be useful.

IndexError: list index out of range

Hi, I am using idr to process 2 replicate bed files but getting an index error. copying command and bash response
idr --samples dyadvscolrep1_c3.0_common.bed dyadvscolrep2_c3.0_common.bed
/usr/local/bin/idr --samples dyadvscolrep1_c3.0_common.bed dyadvscolrep2_c3.0_common.bed
Traceback (most recent call last):
File "/usr/local/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 743, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 1498, in run_script
exec(code, namespace, namespace)
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in
idr.idr.main()
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 839, in main
merged_peaks, signal_type = load_samples(args)
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 703, in load_samples
for fp in args.samples]
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 703, in
for fp in args.samples]
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 53, in load_bed
signal = float(data[signal_index])
IndexError: list index out of range

Installation on Windows 7

Hi Nathan,
I am wondering if the IDR can be installed on windows 7. Is there a way that I can follow for a smooth installation.
ajeet

Report oracle peak values when oracle peaks are provided

Backwards compatible output mode doesnt work

Fold enrichment value when running IDR with pvalue rank

Hi,

I performed IDR analysis on peaks called by MACS2 with the following command line:

idr --samples ${peakDir}/rep1_sorted_peaks.narrowPeak ${peakDir}/rep2_sorted_peaks.narrowPeak \ --input-file-type narrowPeak \ --rank p.value \ --output-file regular_model-idr \ --plot \ --log-output-file regular_model.idr.log

The IDR results show the peak list ranked for pvalue and set signalValue and qvalue to -1.
Now, I would like to retrieve the signalValue (~fold enrichment) for each peak regions from original MACS2 output, i.e.

Chr	Start	End	Name	scaledIDR	Strand	signalValue	pvalue	qvalue	peak	globalIDR	localIDR	rep1_Start	rep1_End	rep1_signalValue	rep1_summit	rep2_Start	rep2_End	rep2_signalValue	rep2_summit	fold_enrichment_rep1	pvalue_1	fold_enrichment_rep2	pvalue_2
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.7965	8.68942	2.2494	11.32801
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.7965	8.68942	2.42635	18.19401
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.7965	8.68942	2.15729	14.41833
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.7965	8.68942	2.14088	13.07617
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.7965	8.68942	2.27126	17.29216
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.7965	8.68942	2.23416	22.68965
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.7965	8.68942	2.26469	20.03755
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.7965	8.68942	2.35859	25.66975
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	2.47086	30.05785	2.2494	11.32801
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	2.47086	30.05785	2.42635	18.19401
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	2.47086	30.05785	2.15729	14.41833
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	2.47086	30.05785	2.14088	13.07617
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	2.47086	30.05785	2.27126	17.29216
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	2.47086	30.05785	2.23416	22.68965
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	2.47086	30.05785	2.26469	20.03755
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	2.47086	30.05785	2.35859	25.66975
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.94094	13.9285	2.2494	11.32801
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.94094	13.9285	2.42635	18.19401
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.94094	13.9285	2.15729	14.41833
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.94094	13.9285	2.14088	13.07617
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501	1.94094	13.9285	2.27126	17.29216
...

and the related IDR merged peak

Chr	Start	End	Name	scaledIDR	Strand	signalValue	pvalue	qvalue	peak	globalIDR	localIDR	rep1_Start	rep1_End	rep1_signalValue	rep1_summit	rep2_Start	rep2_End	rep2_signalValue	rep2_summit
chr10	38815703	38818793	.	288	.	-1	8.68942	-1	2374	0.012076	0.695235	38815703	38818793	8.68942	1449	38815732	38818791	11.32801	1501

I notice that rep1_signalValue and rep2_signalValue fields report the min(pvalue) for each replicate, respectively, and the signalValue reports the min(pvalue) between the two replicates.
How the value of the signalValue/fold_enrichment should be estimated for each IDR merged peak?

--version and --help touch/blow away idrValues.txt

idr --version and idr --help both create an empty idrValues.txt in the current directory, which you probably don't really want to do. More significantly, if you run idr --version or idr --help in a directly that already has an idrValues.txt file, it clobbers it.

Motivation for parameter clipping

I've been reading idr sources to complement my understanding of the paper by Li et al. One thing that caught my attention is predefined parameter ranges ranges in __init__.py. The paper doesn't mention any restrictions, so I would appreciate it if you could comment on why these are required and how the ranges were chosen for each parameter?

TypeError: unorderable types: NoneType() < int()

I ran IDR on the list of peaks where some of the summit values are -1, and I got the following error:

File "/mnt/silencer2/home/yanxiazh/.local/lib/python3.4/site-packages/idr-2.0.3-py3.4-linu x-x86_64.egg/idr/idr.py", line 222, in merge_peaks_in_contig
all_intervals.sort()
TypeError: unorderable types: NoneType() < int()

It looks like the error is because "-1" is converted to None when the files were loaded. And sorting values containing "None" will result in an error in Python3. Is there a way to fix this error?

-Yanxiao

Empty merged picks list

Hello,

I am having a weird issue with IDR.

I am running the command with 2 .narrowPeak files as input, p.value in --rank and --plot option on (code below). Another strange thing that is happening is that the option --output-file-type narrowPeak cannot be detected.

idr --samples 03_macs2/${rep1}/${rep1}_peaks.sort.narrowPeak 03_macs2/${rep2}/${rep2}_peaks.sort.narrowPeak \
--input-file-type narrowPeak \
--rank p.value \
--soft-idr-threshold 0.1 \
-o 05_IDR/${condition}/${condition}.idr \
--log-output-file 05_IDR/${condition}/log_${condition}.idr.log \
--plot

The problem here is that the output file that IDR is returning is empty. The plot seems to be correct. Below there is the log file:

Initial parameter values: [0.10 1.00 0.20 0.50]
Final parameter values: [1.52 1.18 0.81 0.73]

When I activate the option --only-merge-peaks it returns a list with all the merged peaks. However, all those peaks in the list have a score of 0 and a -log10(local_IDR_value) of 0. I am giving the log file below:

Number of reported peaks - 20206/20206 (100.0%)

Number of peaks passing IDR cutoff of 0.1 - 0/20206 (0.0%)

I know it is not a problem from the samples, because I tried with other files that were already analyzed by another bioinformatician that had a correct output (and I also tried with the code used to analyze them) and the error seems to be the same.

This is the error that is printed in the terminal:

Traceback (most recent call last):
  File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/bin/idr", line 4, in <module>
    __import__('pkg_resources').run_script('idr==2.0.2', 'idr')
  File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1462, in run_script
    exec(code, namespace, namespace)
  File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
    idr.idr.main()
  File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/idr/idr.py", line 774, in main
    useBackwardsCompatibleOutput=args.use_old_output_format)
  File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/idr/idr.py", line 415, in write_results_to_file
    if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

If someone could help me solve the problem, it would be extremely useful for me. Thank you.

pkg_resources.DistributionNotFound

I installed lastest IDR (2.0.3) with:
python3 setup.py install --/mnt/data0/lizhidan/software/idr-2.0.3
then, ran:
idr
got:
File "/home/lizhidan/.local/bin/idr", line 4, in <module> __import__('pkg_resources').run_script('idr==2.0.3', 'idr') File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3254, in <module> def _initialize_master_working_set(): File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3237, in _call_aside f(*args, **kwargs) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3266, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 584, in _build_master ws.require(__requires__) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 901, in require needed = self.resolve(parse_requirements(requirements)) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 787, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'idr==2.0.3' distribution was not found and is required by the application

Can someone tell how to fix it？