maxibor / adrsm Goto Github PK
View Code? Open in Web Editor NEWAncient DNA Read Simulator for Metagenomic
License: MIT License
Ancient DNA Read Simulator for Metagenomic
License: MIT License
Some clusters may have computing nodes that are offline.
@maxibor says that currently adrsm
uses:
(Using the JGI API to convert TAXID to species name)
Therefore errors such as
requests.exceptions.ConnectionError: HTTPConnectionPool(host='taxonomy.jgi-psf.org', port=80): Max retries exceeded with url: /tax/pt_name/GCA_013267415 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b5d41b17668>: Failed to establish a new connection: [Errno 110] Connection timed out',))
can happen, if no external internet access is avaliable.
An option to run the tool offline (e.g. using ete3
toolkit) would might be helpful for the above use-case.
As on the tin :D, in this day and age, uncompressed files are a big no-no ๐
Another factor that could be considered when generating in silico aDNA data is sequencing effort.
Currently adrsm
will generate sequencing reads that are equivalent to a 'true sample', i.e., has the original full genomes at the requested genomic depth.
However, sequencing experiments themselves have a fixed limit which is the capacity of the machine (or the amount of DNA actually input into the machine).
I would like to request the option for adrsm
to output a user-defined set number of reads. For example, only have 10 million pairs in the output files. To simulate sequencing, these reads should be sampled randomly from within the initially sheared reads (which should have sampling probability represented by varying genome coverage).
Some of the 'genomes' I downloaded and tried to input were actually unfinished assemblies:
e.g. GCA_001057035.1_ASM105703v1_genomic.fna
These contained symbols such as W/R/S in the sequence itself.
I would suggest that either if this is encountered that adrsm
reports a nice error (rather than traceback) stating this isn't accepted, and please fix.
You could also suggest the following command to clean it up:
sed -i '/^[^>]/s/[R|Y|W|S|M|K|H|B|V|D]/N/g' *.fna
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.