qdata / deepchrome Goto Github PK

View Code? Open in Web Editor NEW

63.0 63.0 14.0 30.83 MB

Bioinformatics16: DeepChrome: Deep-learning for predicting gene expression from histone modifications

Home Page: http://deepchrome.net

License: Apache License 2.0

Lua 35.58% Jupyter Notebook 29.57% Python 34.85%

bioinformatics deep-neural-networks epigenetic-data understanding-computation

deepchrome's People

Contributors

Stargazers

Watchers

Forkers

qiyanjun rintukutum huashi1 resurgo-genetics csgroen cslinwang pivithuruthejan shuomei anugrahsr cai-xvkun alfredyu2017 bapleliu dillonlue

deepchrome's Issues

dataset clarity

Hello,
Could you please provide a detailed readme of how the data was generated. I referred to #2 but its still confusing. Proper steps would be beneficial for all.
Could you please explain the toy data. I understand from the paper that 5 five selected histone modifications from REMC were used as x and the output of gene expression (+1/-1) as y. I couldn't find the column labelling anywhere(which column denotes what and which column is the output).

Thanking you in advance.
Purvanshi

How to get the predictions for each gene?

Hi,

I ran the pipeline on my data smoothly, and got the ROC AUC in the train and test sets. However, I am not very familiar with torch/lua. How could I obtain the final predictions for each gene in the test set (either the 0/1 label or better the probablity [0,1])?. I guess this means just adding/modifying a couple of lines of code.

thanks!

PS. I'd be great too if I could obtain the accuracy/confusion matrices for the test set (not only the ROC AUC)

Reference genes and their expression ... ?

Dear Ritambhara,
by reading your paper, I cannot get which criterion did you use to get the 19802 gene-samples constituting your dataset before train-test-val split.
There is no reference to any annotation file and the procedure by which you associated those gene with the REMC expression quantifications is even darker to me.

In the other issue reported here you suggested that gene TSSs were retrieved by the table that can be constructed at this address. Could you please be more precise? the table got by setting:

clade: Mammal
genome: Human
assembly: Feb 2009 (GRCh37/hg19)
group: Genes and Gene Predictions
track: UCSC Genes (???)
table: knownGene
region: genome

has 82,960 rows, clearly much more than the genes you investigated.

As far as the gene expression quantifications are concerned, REMC says here that those were built by considering gencode v10 annotation file. The related files can be found here, but you were not clear about which of those files was used. It's unlikely that you used the file "57epigenomes.RPKM.pc.gz" because it contains less genes than those you considered (19795 vs 19802).
Moreover, supposing you used the table retrieved as described above to select reference genes and their TSS, how did you translate the ucsc IDs into ensembl IDs in order to consider the right expression line for each gene?

Thank you for the support, I hope you will kindly help me in replicating your experiments.

Cheers,
noired

Input data

Hello,

Many thanks for the package. I'm getting an error with the attached input. Specifically, the AUC score is returning 'nan' (in train.log). I can't see anything that I'm missing so any help would be appreciated!

Many thanks, Aidan
train.txt

Where

generat data

hello
I faced a problem in my implementation phase. Would you please guide me to solve it? Please find the details in bellow:
I downloaded dataset from REMC and I did run the readme file instructions on this dataset. I run this instruction:

bedtools bedtobam -i E128-H3K4me3.tagAlign -g hg19chrom.sizes > ft2.bam

but I faced with this error message :

Error: The requested genome file (hg19chrom.sizes) could not be opened. Exiting!

Thus I downloaded the file “hg19.chrom.sizes” and I run this instruction:

bedtools bedtobam -i E128-H3K4me3.tagAlign -g hg19.chrom.sizes > ft2.bam

then the error message omited. Then the above instruction produced a bam file but when I run “bedtools multicov” I faced with this error message:

bedtools multicov -bams ft2.bam -bed E128-H3K4me3.tagAlign
Could not open input BAM files.

Thus i install "samtools" and i run this instruction:

samtools sort ft2.bam > ft2.sort
samtools index ft2.sort
bedtools multicov -bams ft2.sort -bed E128-H3K4me3.tagAlign

then the error message omited.
I get RPKM from : http://egg2.wustl.edu/roadmap/data/byDataType/rna/expression/
How could I run RPKM on my files? Plus, where the instructions which I used above correct?

Best Regards

requirement for the hardware system

   Hi 
   I want to run your code but I dont know about minimum requirement for the hardware system!
   My computer,s  cpu is cori 5 and  includes 4G RAM. When I am runing DeepChrome  I face this error message
   Segmentation fault (core dumped)................] ETA: 0ms | Step: 0ms 
   does DeepChrome need to GPU?
   Would you please help me to solve this error?
   Best Regards

How to combine multiple samples to generate a model

Hi！
I noticed that the csv file in the toy directory seemed to have only one sample. What should I do if I want to combine multiple samples to generate a model?

AUROC and dataset

Hello ,
I have 2 questions.
Do you have any dataset similar toy dataset? Because i need at any dataset similar toy dataset.
When i am runing deepchrome with toy dataset then final output is:

==> time to learn 1 sample = 2.8497934341431ms
ConfusionMatrix:
[[ 5 0] 100.000% [class: 1]
[ 0 5]] 100.000% [class: 2]

average row correct: 100%
average rowUcol correct (VOC measure): 100%
global correct: 100%
AUROC: 1
==> saving model to /home/msfathalian/deepc/code/results/toy/model.99.net
==> testing on test set:
[=================== 10/10 ===================>] Tot: 8ms | Step: 0ms

==> time to test 1 sample = 0.88992118835449ms
ConfusionMatrix:
[[ 1 1] 50.000% [class: 1]
[ 8 0]] 0.000% [class: 2]

average row correct: 25%
average rowUcol correct (VOC measure): 5.0000000745058%
global correct: 10%
AUROC: 0.4375

Auroc is very low! (0.4375). Why?
Best Regards

qdata / deepchrome Goto Github PK

deepchrome's People

Contributors

Stargazers

Watchers

Forkers

deepchrome's Issues

dataset clarity

How to get the predictions for each gene?

Reference genes and their expression ... ?

Input data

Where

generat data

requirement for the hardware system

How to combine multiple samples to generate a model

AUROC and dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent