qdata / deepchrome Goto Github PK
View Code? Open in Web Editor NEWBioinformatics16: DeepChrome: Deep-learning for predicting gene expression from histone modifications
Home Page: http://deepchrome.net
License: Apache License 2.0
Bioinformatics16: DeepChrome: Deep-learning for predicting gene expression from histone modifications
Home Page: http://deepchrome.net
License: Apache License 2.0
Hello,
Could you please provide a detailed readme of how the data was generated. I referred to #2 but its still confusing. Proper steps would be beneficial for all.
Could you please explain the toy data. I understand from the paper that 5 five selected histone modifications from REMC were used as x and the output of gene expression (+1/-1) as y. I couldn't find the column labelling anywhere(which column denotes what and which column is the output).
Thanking you in advance.
Purvanshi
Hi,
I ran the pipeline on my data smoothly, and got the ROC AUC in the train and test sets. However, I am not very familiar with torch/lua. How could I obtain the final predictions for each gene in the test set (either the 0/1 label or better the probablity [0,1])?. I guess this means just adding/modifying a couple of lines of code.
thanks!
PS. I'd be great too if I could obtain the accuracy/confusion matrices for the test set (not only the ROC AUC)
Dear Ritambhara,
by reading your paper, I cannot get which criterion did you use to get the 19802 gene-samples constituting your dataset before train-test-val split.
There is no reference to any annotation file and the procedure by which you associated those gene with the REMC expression quantifications is even darker to me.
In the other issue reported here you suggested that gene TSSs were retrieved by the table that can be constructed at this address. Could you please be more precise? the table got by setting:
clade: Mammal
genome: Human
assembly: Feb 2009 (GRCh37/hg19)
group: Genes and Gene Predictions
track: UCSC Genes (???)
table: knownGene
region: genome
has 82,960 rows, clearly much more than the genes you investigated.
As far as the gene expression quantifications are concerned, REMC says here that those were built by considering gencode v10 annotation file. The related files can be found here, but you were not clear about which of those files was used. It's unlikely that you used the file "57epigenomes.RPKM.pc.gz" because it contains less genes than those you considered (19795 vs 19802).
Moreover, supposing you used the table retrieved as described above to select reference genes and their TSS, how did you translate the ucsc IDs into ensembl IDs in order to consider the right expression line for each gene?
Thank you for the support, I hope you will kindly help me in replicating your experiments.
Cheers,
noired
Hello,
Many thanks for the package. I'm getting an error with the attached input. Specifically, the AUC score is returning 'nan' (in train.log). I can't see anything that I'm missing so any help would be appreciated!
Many thanks, Aidan
train.txt
hello
I faced a problem in my implementation phase. Would you please guide me to solve it? Please find the details in bellow:
I downloaded dataset from REMC and I did run the readme file instructions on this dataset. I run this instruction:
bedtools bedtobam -i E128-H3K4me3.tagAlign -g hg19chrom.sizes > ft2.bam
but I faced with this error message :
Error: The requested genome file (hg19chrom.sizes) could not be opened. Exiting!
Thus I downloaded the file “hg19.chrom.sizes” and I run this instruction:
bedtools bedtobam -i E128-H3K4me3.tagAlign -g hg19.chrom.sizes > ft2.bam
then the error message omited. Then the above instruction produced a bam file but when I run “bedtools multicov” I faced with this error message:
bedtools multicov -bams ft2.bam -bed E128-H3K4me3.tagAlign
Could not open input BAM files.
Thus i install "samtools" and i run this instruction:
samtools sort ft2.bam > ft2.sort
samtools index ft2.sort
bedtools multicov -bams ft2.sort -bed E128-H3K4me3.tagAlign
then the error message omited.
I get RPKM from : http://egg2.wustl.edu/roadmap/data/byDataType/rna/expression/
How could I run RPKM on my files? Plus, where the instructions which I used above correct?
Best Regards
Hi
I want to run your code but I dont know about minimum requirement for the hardware system!
My computer,s cpu is cori 5 and includes 4G RAM. When I am runing DeepChrome I face this error message
Segmentation fault (core dumped)................] ETA: 0ms | Step: 0ms
does DeepChrome need to GPU?
Would you please help me to solve this error?
Best Regards
Hi!
I noticed that the csv file in the toy directory seemed to have only one sample. What should I do if I want to combine multiple samples to generate a model?
Hello ,
I have 2 questions.
Do you have any dataset similar toy dataset? Because i need at any dataset similar toy dataset.
When i am runing deepchrome with toy dataset then final output is:
==> time to learn 1 sample = 2.8497934341431ms
ConfusionMatrix:
[[ 5 0] 100.000% [class: 1]
[ 0 5]] 100.000% [class: 2]
==> time to test 1 sample = 0.88992118835449ms
ConfusionMatrix:
[[ 1 1] 50.000% [class: 1]
[ 8 0]] 0.000% [class: 2]
Auroc is very low! (0.4375). Why?
Best Regards
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.