novoalab / modphred Goto Github PK
View Code? Open in Web Editor NEWmodPhred is a pipeline for detection of DNA/RNA modifications from raw ONT data
Home Page: https://modphred.readthedocs.io
License: MIT License
modPhred is a pipeline for detection of DNA/RNA modifications from raw ONT data
Home Page: https://modphred.readthedocs.io
License: MIT License
Hello, I run modphred with the test data in the tutorial as follows:
run -f ref/ECOLI.fa -o OUTPUT -i PRJEB22772/* -t4 \
-c dna_r10.3_450bps_fast.cfg \
--host /guppy/bin/guppy_basecall_server
However, even though the guppy_basecall_server is running, modphred seems to be expecting for a different output in the guppy log and just waits indefinitely.
The guppy log indicates the basecall server is running as:
Starting server on port: ipc:///tmp/07f7-506f-964b-2342
So my question is whether modphred is expecting the basecall server to open a TCP port instead of the above?
Thanks!
Dear all,
will modphred work with RNA004-data basecalled with Dorado to detect RNA m6A?
Thanks and best
Matthias
Hi. Using a container, we have your program running with the provided test dataset-it's beautiful!.
My question regards how to best get the clustering working. I have only been doing bioinformatics a year so I apologize for the naive question. How should I specify a short region to analyze the clustering on? Is it incorporated into the mod_cluster.py command? Or, is it specified in the reference genome and the corresponding index? Or some other way?
thanks for the help.
Hi,
this is just an advice to add /opt/modPhred/run to the $PATH within the docker image so you don't need to know this internal path and use it directly from the docker image.
Luca
Hello,
I want to run read clustering on different samples separately (2 CTR and 2 KO), which were originally processed together so I have a common mod.gz for the 4 samples.
I tried to input a "subset" mod.gz with only the columns for CTR samples but the output still contains the reads from the KO samples... how can I specify what input samples I want to run the clustering on?
First off, great work! it's always really nice when things work straight out of the box and really shows how much care you put into it if a beginner like me can get it to work. It also tackles a really important need in the nanopore methylation calling repertoire.
The model you use the dam-dcm specific one was removed from guppy 4.5.2 and the currently available models struggle to replicate your test data results. Since the ONT software download page forces your to download the latest guppy newer users won't be able to use the same basecaller. This is not really an issue of your package but I wasn't sure where to post it because this solution was a little harder to come by than it should have been. For people looking for older versions for linux just change the version number to whichever version you want to install, 4.4.2 is the latest distribution with the dam-dcm still available:
wget https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy_4.4.2_linux64.tar.gz
With the dam-dcm model I was once again able to replicate your test data.
Currently available models in guppy 4.5.2 and rerio:
Guppy: dna_r9.4.1_450bps_modbases_5mc_hac.cfg causes most of the 5mC to be CpGs even in bacteria... either we stumbled onto some amazing bacterial biology or it's just not working. (NB this 5mC model is the same as the 5mC one in rerio)
Rerio: res_dna_r941_min_modbases-all-context_v001.cfg gets a lot closer but below you can see that the methylations are more spread and less accurate.
EDIT: Bottom panel is with the rerio model, top panel is with your pre-basecalled dataset (sorry got that switched up pre-edit)
Got to be honest, it's a little worrisome how much difference the basecalling model can make. Definitely something I need to spend more time looking into. But at least this is a quick fix for other people who have the latest version of guppy and were struggling like me.
And thanks again for putting this package together.
Hi,
I think your scripts are great and it is cool that you provided a pretrained modification-aware model. I would like to ask some questions regarding modPhred.
I've encountered a few issues while testing your pipeline on my DRS data. Could not solve them by myself, so here I am.
First of all, I cloned the repo and installed all deps according to your instructions. Yet I was unable to run the pipeline (all your scripts at once) with basecalling on-the-fly. Neither using my own data or test dataset pointed out in the docs. The problem is "~/src/modPhred/run" part of the command - I get an error message pointing out that there is no such file or directory. And, actually, there is no file called run or run.py within the repo. I checked multiple times. I ran the command from bash and from python - nothing worked. Am I missing something?
(my bash version: 5.0.17; my python version: 3.8.3, pyguppyclient version:0.0.6)
Nevertheless, I was able to run the scripts one by one on your prebasecalled test data (~/PRJEB22772/MARC_ZFscreens_R9.4_1D-Ecoli) and yield a proper output.
So I decided to basecall my own RNA data with guppy 3.5.1 and your pretrained model (rna_r9.4.1_70bps_m6A-m5C-5hmC_hac.cfg). Since I could not do it on the fly, I basecalled my dataset standalone, with --trim_strategy none --reverse_sequence on --u substitution on --fast5_out. Then I used the output as an input in modPhred.
Unfortunately, guppy_align & mod_report do not work as expected. The produced bamfile contains expected total amount of reads, but - according to samtools flagstat & IGV, none of them is mapped to the reference genome (I have also called the same sample with standard ont model (modification unaware) and the majority of reads were recognized as mapped). In fact, I have tried multiple samples from various organisms and the result was always the same. The entire dataset in bamfile was considered as "unmapped", resulting mod.gz and bedfile were empty, and plot could not be yielded because the script was unable to map the reads.
Could you provide an example of usage on RNA data and/or the guppy command line? I would appreciate that very much.
Best,
N.
Hi,
When I run modphred on two pre basecalled samples it will get up to aligning and then fail silently, try to continue to samtools sorting but no bam exists.
When I look in the minimap2 workspace.log I can see only two lines:
[M::mm_idx_gen::49.9551.50] collected minimizers
[M::mm_idx_gen::56.6983.16] sorted minimizers
Running guppy_align.py by itself I see similar behaviour, minimap2 runs in processes even after the guppy_align.py script has terminated, with no other output other than the log like above.
Is there an easy way of manually aligning and sorting with samtools to move on with the later steps of the pipeline?
I've run the samples individually with no issue but would like to compare and contrast them. Is there a way I can concatenate the individual runs together to do some visualisations?
Thanks for any help!
Even though I used the singularity image I got this error when running mod_cluster.py:
Traceback (most recent call last):
File "/users/enovoa/scruciani/soft/modPhred/src/mod_cluster.py", line 21, in
from sklearn.decomposition import PCA
ModuleNotFoundError: No module named 'sklearn'
--> should add scikit-learn to the image?
Hello,
We are getting an error in linux using your test example (using guppy basecaller). The test example using basecalled data works fine but when doing the second example:
run -f ref/ECOLI.fa -o PRJEB22772 -i PRJEB22772/* -t4
--host /guppy/4.2.2/bin/guppy_basecall_server
we get the following error:
ConnectionError: Connect with 'dna_r9.4.1_450bps_modbases_dam-dcm-cpg_hac' failed: [bad_reply] Could not interpret message from server for request: LOAD_CONFIG. Reply: INVALID_PROTOCOL
[2022-11-10 05:37:06.553666] [0x00002aaab5e8e700] [info] Connection error. [bad_reply] Could not interpret message from server for request: LOAD_CONFIG. Reply: INVALID_PROTOCOL
Could you please provide some guidance on this error?
Thanks!
Dear developers,
I'm trying to run the mods_from_bams.py script.
I run the following command:
(modPhred) diaz@Uwe2:/media/data/nuria/bcn_nanopore/files$ ~/src/modPhred/src/mods_from_bams.py -i sample1.bam sample2.bam -f GRCm39.genome.fa -o ./sample1_vs_sample2
In the "modPhred" conda environment I've all the dependencies described in your documentation.
However I get the following error:
[2023-08-01 17:47:04] ===== Welcome, welcome to modPhred pipeline! =====
Traceback (most recent call last):
File "/home/diaz/src/modPhred/src/mods_from_bams.py", line 86, in <module>
main()
File "/home/diaz/src/modPhred/src/mods_from_bams.py", line 59, in main
MaxPhredProb = data["MaxPhredProb"]
KeyError: 'MaxPhredProb'
Could you be so kind to give me a hint to run the code?
Thanks in advance and best regards,
Núria
Hi everyone,
I just installed modPhred via conda, and would now like to run the test data.
I would expect that I have the relevant scripts like modEncode or modAlign in the path... but I do not.
So... how do I run them/where are they?
Thanks,
Bastian
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.