Comments (10)
From the 400-something species that are in the set of ribosomal transcripts I have created clusters based on taxonomical rank "class" to obtain the most diverse subset of species and picked one from each of the 50 cluster and mined the SRA for experiments which are Single-end, Paired-end and from one interval of the specified lengths(<30,<80, <130, <180). I obtained +300 results, from which only 83 had a pubmed-id associated with them, those are the records that I appended into the HTSinfer test data table. I didn't yet go through the publications, maybe it would be nice if some other pair of eyes checks it, anyway no problem for me to search for library and adapter info in the publications, just not sure wether we need so many test datasets, or should we want more perhaps?
from htsinfer.
Awesome job @BorisYourich!
Actually, we were planning to spend an afternoon session with 10 people or so mining, so that the work isn't so dull on any one person. Your work is absolutely perfect for that, because all we need to do is go through the publications and check for the info we need (sequencing kit, most importantly, or adapter, length, orientation directly, if available).
FYI @mzavolan
from htsinfer.
Hey, I am sorry I didnโt collect all the samples yet, had a very hectic two weeks with onboarding on the new project, though I dedicated tommorow to this project, will run the mining scripts and share all the results.
from htsinfer.
I think we have done this, no, @balajtimate? I will close it, please re-open if you think it's not solved.
from htsinfer.
Alright, I am happy you like it ๐ glad to have helped, good luck with mining the publications ๐
from htsinfer.
Looking at the list of organisms that you ended up sampling, I thought that maybe we could also have some more representation by the more common organisms and closely related species. Would it be easy for you to go through the same selection process for a given list of species?
If so, could you do this for the following?
# primates
cjacchus
hsapiens
mmulatta
pabelii
panubis
ppaniscus
ptroglodytes
# livestock
btaurus
chircus
ecaballus
ggallus
oaries
sscrofa
# rodents
cporcellus
mauratus
mmmarmota
mmusculus
ocuniculus
rnorvegicus
# fungi
pmexicana
scervisiae
spombe
# plants
athaliana
bvulgaris
cjaponica
osativa
zmays
# worms
celegans
cbriggsae
# fish
drerio
olatipes
xmaculatus
# insects
amellifera
bmori
dmelanogaster
# amphibians & reptiles
acarolinensis
xtropicalis
from htsinfer.
Sure no problem, though I wonโt manage today. Hopefully tomorrow it will be ready.
from htsinfer.
Great. Tomorrow is totally fine. In fact, the earliest we will need it is next Friday.
from htsinfer.
Ah, before I forget: We kinda need the SRR identifiers of the actual libraries. We can easily get them from the SRX identifiers, so no need to change that for the available ones. But if it's easy to include them for the new ones, then it would be good to include them.
from htsinfer.
@BorisYourich: did you have a chance to look at this?
from htsinfer.
Related Issues (20)
- feat: automatically push docker image in CI workflow HOT 1
- Create Mapping class for managing STAR cmd and outputs
- Expand inferred read length stats
- Warnings potentially override error state
- Correctly set supported Python versions
- Pydantic 2.0 is not supported
- Dockerfile does not make use of Conda environment
- Unexpected errors throw same status code as warnings
- Publish HTSinfer on Bioconda HOT 1
- bug: STAR using uncompressed sample files when mate relationship is not
- Compare inferred read length statistics to SRA metadata HOT 1
- feat: only map when lib source is given, argument to force mapping otherwise HOT 1
- Create dataset overview tables HOT 1
- Create overview figure panel HOT 8
- Package and document analysis code HOT 1
- refactor: reduce memory usage of comparing alignments
- feat: sort alignments by read name
- feat: improve lib type relationship inference for SRA samples HOT 3
- fix: seqid with dash not matching regex
- Error using zarp-cli HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from htsinfer.