compgenomr / book Goto Github PK
View Code? Open in Web Editor NEWHome Page: http://compgenomr.github.io/book
Home Page: http://compgenomr.github.io/book
Line 359 in 08d3028
While the license is clearly indicated on the landing page of the book it is missing in the repository root. Please consider adding the license.
I cannot find the code used to create 'df' (table for expression data of leukemia patients) in clustering section 4.1.1.
Hello
The code of line 530-558 in the 06-genomicIntervals.Rmd can't run successfully:
# get transcription start sites on chr20
library(genomation)
transcriptFile=system.file("extdata",
"refseq.hg19.chr20.bed",
package="compGenomRData")
feat=readTranscriptFeatures(transcriptFile,
remove.unusual = TRUE,
up.flank = 500, down.flank = 500)
prom=feat$promoters # get promoters from the features
# get for H3K4me3 values around TSSes
# we use strand.aware=TRUE so - strands will
# be reversed
H3K4me3File=system.file("extdata",
"H1.ESC.H3K4me3.chr20.bw",
package="compGenomRData")
sm=ScoreMatrix(H3K4me3File, prom,
type="bigWig", strand.aware = TRUE)
Error in validObject(.Object) :
invalid class "ScoreMatrix" object: superclass "mMatrix" not defined in the environment of the object's class
How should I solve this?
Thanks
Stats chapter:
How does the estimate from the random samples change if we simulate more data with data=matrix(rnorm(6000,mean=200,sd=70),ncol=6)
should be
How does the estimate from the random samples change if we simulate more data with data=matrix(rnorm(6000,mean=200,sd=70),ncol=6) keeping the number of samples per dataset constant, as n=6.
unsupervised learning chapter
reconstruction question should be:
Our next tasks are to remove eigenvectors and reconstruct the matrix using SVD, then calculate the reconstruction error as the difference between original and reconstructed matrix. Remove a few eigenvectors, reconstruct the matrix and calculate the reconstruction error. Reconstruction error can be euclidean distance between original and reconstructed matrices.
I am trying to download the PDF version of the book, but I keep getting a 404 page saying
There isn't a GitHub Pages site here.
Text was 'cut and pasted' from electronic version of book at https://compgenomr.github.io/book/
A small amount of follow up carried out see if there was a simple explanation or work around but no attempt made to go much beyond what someone fairly new to R use might achieve
p182
fit logistic regression model
method and family defines the type of regression
in this case these arguments mean that we are doing logistic
regression
lrFit = train(subtype ~ PDPN,
data=training, trControl=trainControl("none"),
method="glm", family="binomial")
Error in eval(predvars, data, env) : object 'PDPN' not found
Strangely while not working with PDPN it worked with other genes e.g CBLN1 and DDX3Y
P209
require(rtracklayer)
session <- browserSession("UCSC",url = 'http://genome-euro.ucsc.edu/cgi-bin/')
genome(session) <- "mm9"
choose CpG island track on chr12
query <- ucscTableQuery(session, track="CpG Islands",table="cpgIslandExt",
range=GRangesForUCSCGenome("mm9", "chr12"))
Error in GRangesForGenome(genome, chrom = chrom, ranges = ranges, method = "UCSC", :
Failed to obtain information for genome 'mm9'
get the GRanges object for the track
track(query)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'track': object 'query' not found
P211
library(genomation)
Warning message:
replacing previous import ‘Biostrings::pattern’ by ‘grid::pattern’ when loading ‘genomation’
filePathPeaks=system.file("extdata",
"wgEncodeHaibTfbsGm12878Sp1Pcr1xPkRep1.broadPeak.gz",
package="compGenomRData")
read the peaks from a bed file
pk1.gr=readBroadPeak(filePathPeaks)
Error: No such process
get the peaks that overlap with CpG islands
subsetByOverlaps(pk1.gr,cpgi.gr)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'subsetByOverlaps': object 'pk1.gr' not found
P217
library(rtracklayer)
File from ENCODE ChIP-seq tracks
bwFile=system.file("extdata","wgEncodeHaibTfbsA549.chr21.bw",package="compGenomRData")
bw.gr=import(bwFile, which=promoter.gr) # get coverage vectors
Error in .local(con, format, text, ...) : UCSC library operation failed
In addition: Warning message:
In .local(con, format, text, ...) : Invalid argument
lseek(3, 844957, invalid 'whence' value (1822621639)) failed
Leading to subsequent errors in rest of section
P225
gene.track <- BiomartGeneRegionTrack(genome = "hg19",
chromosome = "chr21",
start = 27698681, end = 28083310,
name = "ENSEMBL")
Error in gzfile(file, mode) : cannot open the connection
Leading to subsequent errors in rest of section
P239
library(Rqc)
folder = system.file(package="ShortRead", "extdata/E-MTAB-1147")
feeds fastq.qz files in "folder" to quality check function
qcRes=rqc(path = folder, pattern = ".fastq.gz", openBrowser=FALSE)
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
In addition: Warning messages:
1: In normalizePath(path.expand(path), winslash, mustWork) :
path[1]="C:\Users\david\AppData\Local\Temp\Rtmpg7tnGG": The system cannot find the file specified
2: In (function (filename = if (onefile) "Rplots.svg" else "Rplot%03d.svg", :
cairo error 'error while writing to output stream'
3: In file(file, ifelse(append, "a", "w")) :
cannot open file 'C:\Users\david\AppData\Local\Temp\Rtmpg7tnGG/rqc_report.md': No such file or directory
rqcCycleQualityBoxPlot(qcRes)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'perCycleQuality': object 'qcRes' not found
Leading to subsequent errors in rest of section
P243
install.packages("astqcr")
Installing package into 'C:/Users/david/Documents/R/win-library/4.1'
(as 'lib' is unspecified)
Warning: unable to access index for repository https://cran.ma.imperial.ac.uk/src/contrib:
cannot open destfile 'C:\Users\david\AppData\Local\Temp\Rtmpg7tnGG\file1eb47b7e36f1', reason 'No such file or directory'
Warning: unable to access index for repository https://cran.ma.imperial.ac.uk/bin/windows/contrib/4.1:
cannot open destfile 'C:\Users\david\AppData\Local\Temp\Rtmpg7tnGG\file1eb47c802d57', reason 'No such file or directory'
Warning message:
package 'astqcr' is not available for this version of R
p245
write out fastq file with only reads where all
quality scores per base are above 20.
writeFastq(fq[qcount == 0],
paste(fastqFile, "Qfiltered", sep="_"))
Error: UserArgumentMismatch
P270
plotPCA(countsNormalized[selectedGenes,],
col = as.numeric(colData$group), adj = 0.5,
xlim = c(-0.5, 0.5), ylim = c(-0.5, 0.6))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'plotPCA' for signature '"matrix"'
Hello! I'm unable to open this in a PDF format, and instead have to access the book through the web interface. This is fine, but there is a PDF link/icon at the top right of the text, and it takes you to a 404 error.
I accidentally ran by your bookdown when I searched for how to display correlation matrix with hierarchical clustering tree. I noticed that your corrplot(correlationMatrix, order = 'hclust', addrect = 2)
plot doesn't match with your pheatmap
below in terms of variables' order and clustering. It's because in corrplot
, the function takes the correlation matrix as a distance matrix and runs hclust directly on it. Meanwhile, pheatmap
considers the correlation matrix as a normal data set and re-calculates the distance matrix before feeding it into hclust.
To make the two plots consistent with each other, I suggest changing pheatmap
function to add two arguments (clustering_distance_rows
and clustering_distance_cols
) to it. It basically tells pheatmap
to use the current correlation matrix as the distance matrix. The 1 -
is to ensure that perfect positive correlation (1) is considered as min distance and perfect negative correlation (-1) is considered as max distance.
pheatmap(correlationMatrix,
clustering_distance_rows = as.dist(1 - correlationMatrix),
clustering_distance_cols = as.dist(1 - correlationMatrix))
Hello,
There is a typo in section 2.6.1:
library(data.table)
df.f=d(enhancerFilePath, header = FALSE,data.table=FALSE)
It shoud say:
library(data.table)
df.f=fread(enhancerFilePath, header = FALSE,data.table=FALSE)
Thanks,
Carlos.
5.1.1 "However, while doing so[, the] field of statistics developed..."
5.2 Enumeration sometimes has periods at the end, sometimes not.
"algorithm becomes relevant.[ ]“Training” generally"
5.4.2: library(caret)
loaded to late/inconsistently
5.4.3: "Removing genes or samples have both downsides." Please ask English native speaker. Recommendation:
"Both, removing genes and samples have downsides"
5.5.1: "For starters, we will split the 30% of the data as test." Which the?
5.7: "Accuracy is the first metric to look at. This metric is is simply..." Double is.
# gte k-NN prediction on the training data itself, with k=5
. gte?
5.12 "Another variable we can tune is the minimum node size of terminal nodes in the trees (min.node.size). This controls the depth of the trees grown. Setting this to larger numbers might cost a small loss in accuracy but the algorithm will run faster." Shouldn't it mean smaller numbers?
Hi,
I just want to report typos you may have missed:
Chapter 3 > 3.1.2 Describing the spread: measurements of variation: In the probability section :
You have written :
In this case, what we want is the are under the curve shaded in blue. To be able to that we need to integrate the probability density function but we will usually let
And then in the following paragraph :
After calculating the Z-score, we can go look up in a table, that contains the area under the curve for the left and right side of the Z-score, but again we use software for that tables are outdated.
Thank you so so much for such useful content!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.