bmds-lab / crackling Goto Github PK
View Code? Open in Web Editor NEWCRISPR, faster, better – The Crackling method for whole-genome target detection
License: BSD 3-Clause "New" or "Revised" License
CRISPR, faster, better – The Crackling method for whole-genome target detection
License: BSD 3-Clause "New" or "Revised" License
The RNAfold -o
flag will write to a file named RNAfold_output.fold
if no value for -o
is provided.
In the code, below, no filename is provided. If multiple instances of Crackling have been launched from the same directory and some just happen to be at the folding step then RNAfold may overwrite its own output. I ran into this issue when running many instances of Crackling on a HPC.
To fix this issue, an output file should be specified so that the default is not used.
Crackling/src/crackling/Crackling.py
Lines 426 to 440 in b00de36
I used this code as my temporary fix:
runner('{} --noPS -j{} -i {} -o {}'.format(
configMngr['rnafold']['binary'],
configMngr['rnafold']['threads'],
configMngr['rnafold']['input'],
configMngr['rnafold']['output']
),
shell=True,
check=True
)
#os.replace('RNAfold_output.fold' ,configMngr['rnafold']['output'])
printer('\t\tStarting to process the RNAfold results.')
RNAstructures = {}
with open(configMngr['rnafold']['output'], 'r') as fRnaOutput:
When running the extract off-targets script on Windows, an OS error may be thrown when trying to move the temporary file.
The temporary file is never closed and therefore, Windows does not want to move it.
The error occurs here:
According to the Python docs, here, this behavior is expected:
tempfile.NamedTemporaryFile(mode='w+b', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, delete=True, *, errors=None)
This function operates exactly as TemporaryFile() does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from the name attribute of the returned file-like object. Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows). If delete is true (the default), the file is deleted as soon as it is closed. The returned object is always a file-like object whose file attribute is the underlying true file object. This file-like object can be used in a with statement, just like a normal file.
One way to fix this is by closing the temporary file inside the while loop:
Crackling/src/crackling/utils/extractOfftargets.py
Lines 161 to 191 in bb50af9
mergedFile.close()
Dear authors,
g++ -o search_ots_score search_ots_score.cpp -O3 -std=c++11 -fopenmp -mpopcnt
fails with
34 | #include <phmap.h>
| ^~~~~~~~~
compilation terminated.
despite phmap.h being present.
What am I doing wrong?
The extract off-target sites utility will crash if there is only one FASTA sequence provided. This leads to only one intermediate file existing; these intermediate files are sorted and merged. Importantly, the sort is successful but the merge is not.
I do not believe there will be an issue when the input(s) are either: (1) multiple FASTA files or (2) a single multi-FASTA file.
The crash is caused by a variable being referenced before assignment:
Line 191 exists outside of the while
loop, and therefore, mergedFile
may never be declared.
The function main()
in this file is intended to be called from the command line but it is not called.
When I run python3.9 trainModel.py
then main()
should run, but it does not.
Crackling/src/crackling/utils/trainModel.py
Line 119 in b00de36
Adding the following code to the bottom of the script is a potential solution
if __name__ == '__main__':
main()
When the elapsed time for a batch and the total elapsed time is reported, the expression %d %H:%M:%S
is used. Even when the pipeline takes less than one day to run, the elapsed time will be reported with %d
being one (i.e., taking at least 24 hours), as %d
reports day of month (there is no 0'th day of a month). See here.
The code used to report elapsed time needs to be changed so that days/hours/minutes/seconds are reported accurately.
Crackling/src/crackling/Crackling.py
Line 886 in b00de36
Crackling/src/crackling/Crackling.py
Line 879 in b00de36
Some of the stdout log is inconsistently formatted, or is incorrect. See examples below.
There are several cases where large numbers are displayed with ,
separating 1000's. There are some cases where no ,
's are used.
>>> 2022-02-17 16:14:29:442676: Done.
>>> 2022-02-17 16:14:29:442723: 2500000 guides evaluated.
>>> 2022-02-17 16:14:29:442841: This batch ran in 01 00:40:21 (dd hh:mm:ss) or 2421.1502072811127 seconds
>>> 2022-02-17 16:14:29:442913: Processing batch file 2 of 7
>>> 2022-02-17 16:14:35:888107: Loaded 2,500,000 guides
>>> 2022-02-17 16:14:35:888266: CHOPCHOP - remove those without G in position 20.
>>> 2022-02-17 16:15:13:670180: 773,162 of 1,043,323 failed here.
...
In the run that produced this output, the batch size was set to 2.5m guides, yet, below, on the second line, it says the page will contain 5m guides.
The page size would not be larger than the batch size.
I believe pageSize = min(batchSize, maxPageSize, actualPageSize)
would be more accurate.
Also, notice the lack of commas again.
>>> 2022-02-17 16:17:10:869049: mm10db - check secondary structure.
>>> 2022-02-17 16:17:49:964502: Processing page 1 (5,000,000 per page).
>>> 2022-02-17 16:17:49:964671: Constructing the RNAfold input file.
>>> 2022-02-17 16:17:50:481058: 678,423 guides in this page.
...
>>> 2022-02-17 16:24:15:592360: Calculating mm10db final result.
>>> 2022-02-17 16:24:18:294110: 426284 accepted.
>>> 2022-02-17 16:24:18:294204: 2073716 failed.
Hi Crackling developers,
Thanks for making crackling available.
I ran into issue while trying to install it.
The error messages are listed below:
g++ -O3 -std=c++11 -fopenmp -mpopcnt -Isrc/ISSL/include -o bin/isslScoreOfftargets src/ISSL/isslScoreOfftargets.cpp
src/ISSL/isslScoreOfftargets.cpp: In function ‘int main(int, char**)’:
src/ISSL/isslScoreOfftargets.cpp:193:14: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
193 | fread(&mask, sizeof(uint64_t), 1, fp);
| ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ISSL/isslScoreOfftargets.cpp:194:14: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
194 | fread(&score, sizeof(double), 1, fp);
| ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/ld: cannot open output file bin/isslScoreOfftargets: No such file or directory
collect2: error: ld returned 1 exit status
make: *** [Makefile:13: isslScoreOfftargets] Error 1
Can you please help me figure it out?
Thanks a lot in advance.
Best,
Huanle
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.