Comments (10)
Hi @hasindu2008 I think we could definitely do that for one dataset first. We could even host a short tutorial on how to use/query the BLOW5 format using this one file if you want to contribute that? see here for examples on short tutorials https://github.com/GoekeLab/sg-nex-data#data-analysis-tutorials-and-workflows
from sg-nex-data.
Thank you very much for being open to this. Which sample would you recommend? I can do the conversion and share the file with to host, and then I will play with and determine the best parameters for queries in an AWS dataset.
from sg-nex-data.
Sounds good! You could use this file here, which is one of the direct RNA-Seq samples: s3://sg-nex-data/data/sequencing_data_ont/fast5/SGNex_K562_directRNA_replicate4_run1/SGNex_K562_directRNA_replicate4_run1.tar.gz
from sg-nex-data.
I have converted that dataset and have done some sanity checks such as read counts, and uniqueness in read IDs and also have basecalled. What is the way you prefer for me to provide the file for you to upload to the AWS S3?
from sg-nex-data.
@jonathangoeke, just a ping in case you forgot this :)
from sg-nex-data.
Hi @hasindu2008 , how big is the converted file in BLOW5 format? If it's not too big, we can provide you with a dropbox link so that you can share with us! Btw, could you share your email address as well so that we can send the dropbox link to you later?
from sg-nex-data.
@cying111
It is this SGNex_K562_directRNA_replicate4_run1/SGNex_K562_directRNA_replicate4_run1 that I converted as suggested and the converted size is around 50GB (originally 74G).
I temporarily uploaded it to my AWS S3 space and see if you can directly copy it to your s3 bucket?
BLOW5 file and index: https://slow5test.s3.amazonaws.com/tmp/blow5/SGNex_K562_directRNA_replicate4_run1/
I could convert whole the dataset and provide the links like above if it is convenient.
If you are interested, here are the basecalls for that dataset from a recent Guppy 6.3.7
https://slow5test.s3.amazonaws.com/tmp/guppy_6.3.7_hac_fastq/SGNex_K562_directRNA_replicate4_run1/
If you think it is useful, I should be able to rebasecall the whole dataset conveniently after converting to BLOW5.
from sg-nex-data.
Great! Could you send me ([email protected]) the downloading paths for the BLOW5 file and index? The provided link is not usable I think.
For the new basecalled fastq file, could you also send me the downloading path as well? I will take a look at it and get back to you after that.
from sg-nex-data.
Many thanks @hasindu2008 @cying111! All files are now available as BLOW5 with the latest release v.0.4.0 #33
from sg-nex-data.
Great. Thanks for supporting this.
from sg-nex-data.
Related Issues (19)
- Fast5 files HOT 3
- Data Release? HOT 2
- sample_id SGNex_MCF7-EV_directRNA_replicate2_run1 is duplicate HOT 1
- Questions about sequin used HOT 2
- questions about data accessions HOT 1
- two fastq files were not correctly formated HOT 3
- Sequin spike-in reference HOT 4
- Identification of m6A with the SG-NEx samples HOT 1
- RNA004 data HOT 2
- No download for fastq HOT 2
- Naming of samples HOT 2
- fast5 file unable to download now HOT 13
- Possible mixup of Illumina short read data Hct116 and HepG2
- Errors in the augmented annotation GTF file HOT 1
- Are there tissue samples available? HOT 1
- What kits were used for Nanopore library prep? HOT 2
- Number of PCR cycles HOT 2
- Short-read data HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sg-nex-data.