Code Monkey home page Code Monkey logo

sg-nex-data's People

Contributors

cafelton avatar cying111 avatar cyusong avatar hasindu2008 avatar jonathangoeke avatar josiegleeson avatar lingminhao avatar n-hoffmann avatar yalanbi avatar yuukiiwa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sg-nex-data's Issues

Short-read data

Hi,

I'm trying to understand which files belong to short-read seq data. Is it the ones that say "cDNA" or "cDNAStranded"? When looking at read lengths both types of files have reads longer than 300nt. So not sure about how to distinguish PCR-cDNA and short-read cDNA.

Thanks

fast5 file unable to download now

Hi I am doing some analysis on RNA modifications and I need to download the raw fast5 file for dRNA-seq data. But the link is not available now. Another question, when your lab will relase other cell line?

Raw data in BLOW5 format

Hi

This is a very useful dataset, but sadly because they are stored as tar.gz archives there is no way to grab signals for particular read IDs, without downloading and extracting all the tar.gz files. Would you be able to host raw data in BLOW5 format, at least for one dataset to begin with?

Questions about sequin used

Hi, thank you for providing such a wonderful resources to the community. I am trying to analyze the data to compare performance of multiple quantification program. To do that, I downloaded cDNA PCR sequencing data (SQK-PCS109) and short-read data with sequin included. I first ran Anaquin (sequin analyzing softwares) to check the consistency of expression between expected and estimated (Kallisto is used by Anaquin). However, correlation was very poor (around 0.1). Then I realized that Anaquin software provides sequin 2.4 version but in the excel file provided in the original manuscript, it was stated that sequin version 1 was used. Do they differ in terms of transcripts used and their concentration? I tried to find the sequin version 1 reference file (decoy chromosome, gene annotation in GTF) but couldn't find any. I visited the sequinstandard web site and tried to access the resources in the webpage but can't ( I have to log in to access the files but they won't let me register. I don't know why). Could provide reference files for the sequin used in the study (also the file that contains expected concentration)? Thank you and have a nice day

two fastq files were not correctly formated

Hi team,

I have downloaded some cDNA fastq files from you s3 repo.
I found 2 files are not correctly formatted when I run QC with NanoPlot.

SGNex_MCF7_cDNAStranded_replicate2_run1/SGNex_MCF7_cDNAStranded_replicate2_run1.fastq.gz
SGNex_K562_cDNAStranded_replicate3_run3/SGNex_K562_cDNAStranded_replicate3_run3.fastq.gz

The first one has additional strings before the @ character of the first read.

fastq_fail/FAK34234_679ea2e77287c6ea3bab84c69ca16d29e5d9c760_228.fastq000666 001750 001750 00010735421 13424777162 023424 0ustar00gridgrid000000 000000 @0185f0c7-c4a5-40fb-9ac2-6907653a86a5 runid=679ea2e77287c6ea3bab84c69ca16d29e5d9c760 read=46243 ch=61 start_time=2019-02-01T08:06:48Z flow_cell_id=FAK34234 protocol_group_id=010219_MCF7_mRNA_PCS109 sample_id=010219_MCF7_mRNA_PCS109
ACGGTAATACTTCGGTCTTGTTTCGACAATCGGTCGCTCAGACCGACCGTGGAAC
+
#"*%&$#%"$&"""""$&&#"""""""++*++)/+%#%##'+*$%&'%"##("&$

The second one has a read with an unmatching length of quality score.

@09f55d50-803e-4048-899d-bb2fbdbf9c33 runid=446e90283984afd70d3f9af90262644290c7fca2 read=1796 ch=64 start_time=2019-01-07T07:56:26Z flow_cell_id=FAK11042 protocol_group_id=070119_K562_mRNA_PCS109 sample_id=070119_K562_mRNA_PCS109
TCGGTGATAAAGTGTTAATCGTCGG
+
%"-$&%""""""""$"""""""""

Can you confirm this?
Cheers,
Alex

questions about data accessions

Hello, I have watched your presentation on the Nanopore Tech webinar recently. Thanks for the exciting work you've done and the data availability you provided!

I want to do some tests about the gene expression correlation between nanopore data and Illumina data. Could you please provide the accession number of Illumina data set when you mentioned "ENCODE GS20" on the webinar?

Thank you very much!

No download for fastq

I tried: aws s3 sync --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fastq/sample_name .

but there is no download happening? Not sure what I#m missing.

Number of PCR cycles

I went through the Biorxiv print, particularly the methods and supplementary tables but I could not see any info on the number of PCR cycles used.

It seems there quite bit of PCR bias when comparing genes between cDNA and cDNA-PCR. Was. this is no. of cycle-dependent or is more to do with the alignment of genes with repetitive elements?

Sequin spike-in reference

Hi,

My advisor, @rob-p, and I are currently utilizing the SG-NEx dataset for the evaluation of our quantification model. In the course of our research, we came across a reference in the paper "https://doi.org/10.1101/2021.04.21.440736," where it is mentioned that, "For a subset of sequencing runs, we included sequin spike-in RNAs with known concentrations that enable the evaluation of transcript discovery and quantification."

We have been searching for resources that provide information about which specific dataset within SG-NEx contains the sequin spike-in data, as well as the corresponding concentrations of these spike-ins. However, our efforts thus far have not yielded the desired information.

Any guidance or information in locating the source or reference that specifies the dataset containing the sequin spike-in data and their respective concentration values within the SG-NEx dataset would be greatly appreciated.

Regards,
Zahra

Identification of m6A with the SG-NEx samples

Hello! I want to explore m6Anet using the SG-NEx dataset with preprocessed data for m6Anet. However, it appears that this dataset lacks the data.info component. Could you please guide how I can effectively utilize this dataset to experiment with m6Anet? Thank you for your assistance.

Data Release?

I was hoping to take a look at the raw data of this project. However, the links to the fastq and bam files in DATA.md don't appear to be working. Are you waiting for publication for that data to be available, despite your posted preprint?

Fast5 files

Hi, I was wondering whether you could provide the fast5 files for your data. I'm doing some polyA analysis on ONT reads, so I'd like to run tailfindr/nanopolish which requires the signal data.

RNA004 data

Dear developers, first of all, thanks a ton for this wonderful resource!

With the recent release of the RNA004 kit, do you also plan to make some RNA004 data available in the database?

Naming of samples

I#m having trouble to understand the naming convention of samples. Some cell lines have missing replicates (e.g. only 1,5,6) and some replicates have only run1 or run2. Would greatly appreciate your help on how to process the samples.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.