bushmanlab / genetherapypatientreportmaker Goto Github PK
View Code? Open in Web Editor NEWThis project forked from esherm/genetherapypatientreportmaker
Single patient integration sites report
License: GNU General Public License v3.0
This project forked from esherm/genetherapypatientreportmaker
Single patient integration sites report
License: GNU General Public License v3.0
There is a small general issue I wanted to bring up before I forget: in both the runstats and the reports, samples with VCN = 0 are being given a VCN of NA. I realize this is probably small beans right now, but it will be useful to make the correction eventually.
put genemark in a function, add on the fly check the genemarks really mean what they say. Yinghua's job
add on fly check that sum of estAbunProp by unique posid for each trial, patient, time, cell are close to 1. Chris job?
TOC is good for html but
should be removed from pdf
If legend is extensive, it pushes the plots to the side instead of spreading our. Need to set a dynamic height for the legend, possibly based on cell type.
@cnobles
remove folder with figures each time before rerunning
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
Hi Guys
You need to fix the panel under "longitudinal behavior of major clones" (attached). It is showing as two separate panels, but both samples are PBMC from two time points. Please make one graph, with lines connecting similar clones.
Thanks
Rick
Turn out to be space problem. "PBMC" vs "PBMC "
As Yinghua has communicated with Chuck and we make the assumption that any repeats of a distinct width for a specific integration sites are pcr replicates and a result of amplification, we should only try to pass unique lengths from a single sites and replicate into estAbund. I think this could be done by just wrapping dplyr::distinct() around the df made in the beginning of the estimateAbundances script.
What art you thoughts?
For example, an in depth analysis of pAK m6:
.....................|............in depth analysis..............|.....report....|
......................uniSites....multiSites......totalSites......uniSites
GTSP0470.......19455............536..........19991.........20528
GTSP0467.........4908..............39............4947...........4982
GTSP0469.........9720..........1010..........10730.........10137
GTSP0471...........953..........1864............2817.............969
GTSP0473...........249..............91..............340.............530
GTSP0472.............56................9................65.............155
GTSP0468...........334............113..............447.............354
The number of unique sites should be calculated for each GTSP, and not from each replicate and then added together. This may be the case in what's going on, but one way we could solve this problem is by asking length(unique(sites[GTSP]$posID)) after all the replicates have been standardized together.
Otherwise the issue may lie in the way we are standardizing sites, calling several sites near each other unique because the tail for the original site was > 5 nt.
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
subscript contains NAs or out-of-bounds indices
Calls: getAbundanceThreshold ... extractROWS -> normalizeSingleBracketSubscript -> NSBS -> NSBS
Execution halted
Analysis of integration site distributions and relative clonal abundance
Trial: CART
Subject: subject p959-1
The dynamic height of the figures is setting wrong when there are 3 or multiples of 3 celltypes. Both barplots by sample and heatmaps. Height should adjust so that 1 row for 3 barplots, 2 rows for 4-6, and so on. Currently 3 barplots gets 2 rows and has an entire empty row.
Heatmaps get extended if there are 6 or multiples of 6.
we need to move from 237 to 98
> abundCutoff.detailed <- getAbundanceThreshold(standardizedDereplicatedSites, 50)
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
subscript contains NAs or out-of-bounds indices
usa.soniclength doesn't seem to make any difference. check the code
Generating report from the following sets
sampleName GTSP patient celltype timepoint
1 GTSP0436-1 GTSP0436 pSINSCID_UK01 PBMC m6
2 GTSP0436-2 GTSP0436 pSINSCID_UK01 PBMC m6
3 GTSP0436-3 GTSP0436 pSINSCID_UK01 PBMC m6
4 GTSP0436-4 GTSP0436 pSINSCID_UK01 PBMC m6
5 GTSP0437-1 GTSP0437 pSINSCID_UK01 Neutrophils m6
6 GTSP0437-2 GTSP0437 pSINSCID_UK01 Neutrophils m6
7 GTSP0437-3 GTSP0437 pSINSCID_UK01 Neutrophils m6
8 GTSP0437-4 GTSP0437 pSINSCID_UK01 Neutrophils m6
9 GTSP0438-1 GTSP0438 pSINSCID_UK01 Tcells m6
10 GTSP0438-2 GTSP0438 pSINSCID_UK01 Tcells m6
11 GTSP0438-3 GTSP0438 pSINSCID_UK01 Tcells m6
12 GTSP0438-4 GTSP0438 pSINSCID_UK01 Tcells m6
13 GTSP0440-1 GTSP0440 pSINSCID_UK01 NKcells m6
14 GTSP0440-2 GTSP0440 pSINSCID_UK01 NKcells m6
15 GTSP0440-3 GTSP0440 pSINSCID_UK01 NKcells m6
16 GTSP0440-4 GTSP0440 pSINSCID_UK01 NKcells m6
17 GTSP0441-1 GTSP0441 pSINSCID_UK01 Monocytes m6
18 GTSP0441-2 GTSP0441 pSINSCID_UK01 Monocytes m6
19 GTSP0441-3 GTSP0441 pSINSCID_UK01 Monocytes m6
20 GTSP0441-4 GTSP0441 pSINSCID_UK01 Monocytes m6
Error in rep(1:length(tframe.list), sapply(tframe.list, nrow)) :
invalid 'times' argument
Calls: lapply ... lapply -> FUN -> getEstimatedAbundance -> estAbund -> factor
Execution halted
below not working
if(use.sonicLength){
estAbund.uniqueFragLen <- function(location, fragLen, replicate=NULL){
if(is.null(replicate)){replicate <- 1} #Need for downstream workflow
dfr <- data.frame(location = location, fragLen = fragLen,
replicate = replicate)
dfr_dist <- distinct(dfr)
site_list <- split(dfr_dist, dfr_dist$location)
theta <- sapply(site_list, function(x){nrow(x)})
theta <- theta[unique(dfr$location)]
list(theta=theta)
}
estAbund <- estAbund.uniqueFragLen
}
please generate all reports before merge
p03712-08.log:Execution halted
p04409-18.log:Execution halted
p04409-26.log:Execution halted
p959-101.log:Execution halted
pFR03.log:Execution halted
pSINSCID_UK01.log:Execution halted
pWAS00002.log:Execution halted
if report is in ~ Ok
if in run folder ~ fails
unlink("CancerGeneList", force=TRUE, recursive=TRUE)
cmd <- "git clone https://github.com/BushmanLab/CancerGeneList.git"
message(cmd)
stopifnot( system(cmd)==0 )
source("CancerGeneList/onco_genes.R")
unlink("intSiteCaller", force=TRUE, recursive=TRUE)
cmd <- "git clone https://github.com/BushmanLab/intSiteCaller.git"
message(cmd)
stopifnot( system(cmd)==0 )
source("intSiteCaller/hiReadsProcessor.R")
source("intSiteCaller/standardization_based_on_clustering.R")
should be removed
In our standard database schema, these two are kept in different locations and should point to two different places to get the information (specimen_management and intSites.db).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.