Code Monkey home page Code Monkey logo

deepchrome's People

Contributors

qiyanjun avatar rs3zz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepchrome's Issues

dataset clarity

Hello,
Could you please provide a detailed readme of how the data was generated. I referred to #2 but its still confusing. Proper steps would be beneficial for all.
Could you please explain the toy data. I understand from the paper that 5 five selected histone modifications from REMC were used as x and the output of gene expression (+1/-1) as y. I couldn't find the column labelling anywhere(which column denotes what and which column is the output).

Thanking you in advance.
Purvanshi

How to get the predictions for each gene?

Hi,

I ran the pipeline on my data smoothly, and got the ROC AUC in the train and test sets. However, I am not very familiar with torch/lua. How could I obtain the final predictions for each gene in the test set (either the 0/1 label or better the probablity [0,1])?. I guess this means just adding/modifying a couple of lines of code.

thanks!

PS. I'd be great too if I could obtain the accuracy/confusion matrices for the test set (not only the ROC AUC)

Reference genes and their expression ... ?

Dear Ritambhara,
by reading your paper, I cannot get which criterion did you use to get the 19802 gene-samples constituting your dataset before train-test-val split.
There is no reference to any annotation file and the procedure by which you associated those gene with the REMC expression quantifications is even darker to me.

In the other issue reported here you suggested that gene TSSs were retrieved by the table that can be constructed at this address. Could you please be more precise? the table got by setting:

clade: Mammal
genome: Human
assembly: Feb 2009 (GRCh37/hg19)
group: Genes and Gene Predictions
track: UCSC Genes (???)
table: knownGene
region: genome

has 82,960 rows, clearly much more than the genes you investigated.

As far as the gene expression quantifications are concerned, REMC says here that those were built by considering gencode v10 annotation file. The related files can be found here, but you were not clear about which of those files was used. It's unlikely that you used the file "57epigenomes.RPKM.pc.gz" because it contains less genes than those you considered (19795 vs 19802).
Moreover, supposing you used the table retrieved as described above to select reference genes and their TSS, how did you translate the ucsc IDs into ensembl IDs in order to consider the right expression line for each gene?

Thank you for the support, I hope you will kindly help me in replicating your experiments.

Cheers,
noired

Input data

Hello,

Many thanks for the package. I'm getting an error with the attached input. Specifically, the AUC score is returning 'nan' (in train.log). I can't see anything that I'm missing so any help would be appreciated!

Many thanks, Aidan
train.txt

generat data

hello
I faced a problem in my implementation phase. Would you please guide me to solve it? Please find the details in bellow:
I downloaded dataset from REMC and I did run the readme file instructions on this dataset. I run this instruction:

bedtools bedtobam -i E128-H3K4me3.tagAlign -g hg19chrom.sizes > ft2.bam

but I faced with this error message :

Error: The requested genome file (hg19chrom.sizes) could not be opened. Exiting!

Thus I downloaded the file “hg19.chrom.sizes” and I run this instruction:

bedtools bedtobam -i E128-H3K4me3.tagAlign -g hg19.chrom.sizes > ft2.bam

then the error message omited. Then the above instruction produced a bam file but when I run “bedtools multicov” I faced with this error message:

bedtools multicov -bams ft2.bam -bed E128-H3K4me3.tagAlign
Could not open input BAM files.

Thus i install "samtools" and i run this instruction:

samtools sort ft2.bam > ft2.sort
samtools index ft2.sort
bedtools multicov -bams ft2.sort -bed E128-H3K4me3.tagAlign

then the error message omited.
I get RPKM from : http://egg2.wustl.edu/roadmap/data/byDataType/rna/expression/
How could I run RPKM on my files? Plus, where the instructions which I used above correct?

Best Regards

requirement for the hardware system

   Hi 
   I want to run your code but I dont know about minimum requirement for the hardware system!
   My computer,s  cpu is cori 5 and  includes 4G RAM. When I am runing DeepChrome  I face this error message
   Segmentation fault (core dumped)................] ETA: 0ms | Step: 0ms 
   does DeepChrome need to GPU?
   Would you please help me to solve this error?
   Best Regards

AUROC and dataset

Hello ,
I have 2 questions.
Do you have any dataset similar toy dataset? Because i need at any dataset similar toy dataset.
When i am runing deepchrome with toy dataset then final output is:

==> time to learn 1 sample = 2.8497934341431ms
ConfusionMatrix:
[[ 5 0] 100.000% [class: 1]
[ 0 5]] 100.000% [class: 2]

  • average row correct: 100%
  • average rowUcol correct (VOC measure): 100%
  • global correct: 100%
  • AUROC: 1
    ==> saving model to /home/msfathalian/deepc/code/results/toy/model.99.net
    ==> testing on test set:
    [=================== 10/10 ===================>] Tot: 8ms | Step: 0ms

==> time to test 1 sample = 0.88992118835449ms
ConfusionMatrix:
[[ 1 1] 50.000% [class: 1]
[ 8 0]] 0.000% [class: 2]

  • average row correct: 25%
  • average rowUcol correct (VOC measure): 5.0000000745058%
  • global correct: 10%
  • AUROC: 0.4375

Auroc is very low! (0.4375). Why?
Best Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.