Code Monkey home page Code Monkey logo

team_undecided's People

Contributors

abaghela avatar allison-tai avatar echu113 avatar emmagraham avatar santina avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

team_undecided's Issues

Notes for Monday Feb6th Meeting

Hey people,

I will also try to put down what I remember of Amrits conversation with us.

  • Look at differential network analysis. This can be used to see the differences in networks between the two conditions (asthma and normal).
  • We can also see if we can try to group patients together based on methylation and rna-seq profiles. We can maybe create networks from groups of patients and use this differential network analysis again. This way we can see how the networks change among asthma endotypes (if any).
  • Try to download the data using the GEOquery package. It will give you all the data really nicely. I can't seem to do it from whatever computer I am on.
  • https://www.ncbi.nlm.nih.gov/pubmed/27473063

Differential expression analysis

  • conduct differential network analysis using limma voom
  • identify genes that are differentially expressed between control and patient groups

WGCNA modules identification

  • conduct WGCNA on controls to cluster genes into different modules
  • identify the most interesting modules that we want to perform differential network analysis on
  • try to limit the number of genes we have <500 genes in each module, probably okay?
  • output list of genes organized into WGCNA modules
  • in the subsequent step, we will identify the modules that have "higher connectivity" or "more connected genes" (between the two patient groups)

Pipeline PDF

I made a pretty flow chart for our poster - see attached. Though it outlines what we need to do. I'll go ahead and create separate issues for each thing in preparation for our little hackathon tomorrow.

STAT 540 - Analysis Pipeline.pdf

Clustering

  • log2 transform data in order to reveal more variations
  • ensure we're using normalized data

Biological interpretation

  • now we have a list of modules / pathways that are significant from our differential network analysis statistics
  • what do they mean??

Initial proposal feedback

Name Department/Program Experties/Interests GitHub ID
Arjun Baghela Bioinformatics Immunology & Transcriptomics @abaghela
Emma Graham Bioinformatics Machine Learning & Metabolomics @emmagraham
Allison Tai Bioinformatics Machine Learning & DNA Structure @faelicy
Eric Chu Bioinformatics Neuroscience & Transcriptomics @echu113

Team name: Undecided

One paragraph on the basic idea of the project:

Asthma is characterized by chronic inflammation, and affects over 400 million children and adults worldwide (1). The heterogeneity of the disease manifests as variation in clinical onset, responsiveness to treatment and comorbidities (2). Upstream events in the lung epithelial cells of the lower airway have been postulated to initiate Type II inflammation, which is mediated by CD4+ T cells, leading to cytokine production and remodeling of the cellular environment in the lower airway. Recent studies using RNA-seq data have characterized the Type II immune response in CD4+ T cells; however, the upstream events in epithelial cells that initiate this response remain unknown (3,4). A study recently published, which obtained RNA-Seq and methylation profiles for 76 asthma patients, investigated the genetic and epigenetic markers upregulated in lower airway epithelial cells during asthmatic responses (5). However, the conclusions of the study are limited by numerous confounding factors such as medication usage, comorbidities and artefacts of experimentation, which can obscure the detection of meaningful biological signals. Furthermore, the generation of interactive networks with WGCNA (6) in the aforementioned study may have removed meaningful connections in an attempt to reduce noise, and also has difficulties incorporating heterogeneous data. RNAseq and methylation data from lung epithelial cells in subjects with and without asthma will be analyzed to determine master regulator genes that initiate the Type II inflammatory response in lung epithelial cells. To begin our analysis, RNAseq data will be processed to remove the effect of confounding variables, and used to construct a co-expression network. Similarly, we will construct a co-expression network with differentially methylated CpGs (DMCs). Both these networks may give us insights into the genetic and epigenetic signatures that influence variation in asthma endotypes. We may try other analyses too, if we have time. These include determining whether methylation levels at DMCs are correlated with expression levels of nearby genes and integrating DMC and RNA-Seq data using a network-interaction based approach.

References

  1. Pawankar R. 2014. Allergic diseases and asthma: a global public health concern and a call to action. World Allergy Organ. J. 7: 12.
  2. Wesolowska-Andersen A, Seibold MA. Airway molecular endotypes of asthma: dissecting the heterogeneity. Curr Opin Allergy Clin Immunol. 2015;15(2):163–168. doi:
  3. Locksley RM. Asthma and allergic inflammation. Cell. 2010;140:777–783.
  4. Seumois, Grégory, et al. "Transcriptional profiling of Th2 cells identifies pathogenic features associated with asthma." The Journal of Immunology 197.2 (2016): 655-664
  5. Nicodemus-Johnson, Jessie et al. “DNA Methylation in Lung Cells Is Associated with Asthma Endotypes and Genetic Risk.” JCI Insight 1.20 (2016): e90151. PMC. Web. 26 Jan. 2017.
  6. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi:10.1186/1471-2105-9-559.

Test for significance by bootstrapping

  • Use bootstrapping to generate distribution of differential network analysis statistics
  • sample samples with replacement... mixed the samples of two groups
  • conduct similar differential network analysis tests... repeat many many many times
  • p-value can be generated by looking at the proportion larger than the values obtained in the real tests in the distribution
  • adjust for multiple testing given a list of p-values

Data preprocessing

  • Normalize count by library size - can be done using voom?

  • Filter out lowly expressed genes (less than 3 per million?)

  • any other necessary preprocessing

  • produce PCA figures to illustrate why no further processing is necessary?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.