Code Monkey home page Code Monkey logo

datascience.curriculum's Introduction

Welcome to the Computational Data Analysis Workshop

This course is designed to help you improve your data management and analysis skills. The skills you will learn can be used for everything from straightforward measurements of biological data to high-dimensional data like scRNA-seq.

Did you ever see a figure you made last year but can't remember how you made it? Do you need to write a Data Sharing Plan for an NIH grant but don't know where to begin? Understanding the principles we cover here will help you and others reproduce and extend your work.

The 2023 course has already started, but you can still join in by emailing [email protected].

Dates for the 2023 workshop:

  • April 12
  • April 19
  • April 26
  • May 3
  • May 10
  • May 17

These are all Wednesday mornings. Time will be 10-11 AM. Format is Zoom Webinar.

Goals

The goals of this workshop are for you to become comfortable with the R statistical computing language and associated computational technology:

  • Rstudio for interacting with R
  • git for version control
  • github to share your work
  • R package development to keep your code and data separate and organized

Specific Objectives

Once you have completed this course you should be able to:

  • properly format data for efficient computation
  • generate a table of descriptive statistics for data from a typical biological experiment
  • perform statistical testing as appropriate
  • generate publication-quality plots using ggplot
  • perform basic analysis of single cell RNA sequencing data
  • compile processed source data into an R data package
  • understand the difference between analysis code and source data
  • use basic version control functions to track and document changes to your analysis
  • publish your code so reviewers can understand how you arrived at your results

Prerequisites

The course assumes no prior knowledge of R. It is designed for biologists with an interest in analyzing high-dimensional and/or computationally-intensive data. The only prerequisites are a basic understanding of biological experimental design (controls, biological replicates, technical replicates, etc.) and a computer.

Before the first class you should make sure your computer is ready to go. There are several computing options to choose from.

  • You can use your personal computer. Most/all of the software we use in the course is available for Mac and PC and will run directly on your machine. If you own the computer, just install the programs below. If it is a lab computer, have your IT admin install the programs.
  • You may have access to a lab server or cloud service running Rstudio server. This will be linux-based and will run everything we will be using in the course.
  • You can register for the course at the Ohio Supercomputing Center. This will be a free for you to use for the duration of the course. Access will be terminated at the end of the course. If you would like to use this option, please email [email protected]. I will send you an invitation.
  • You can use an existing academic account at the Ohio Supercomputing center or your institution's equivalent. The cost for what we will be doing will be tiny and this has the advantage of being a computing environment you may already be familiar with and have used/will use for your work.

R studio cloud (now Posit cloud) is not a great option for the course or for your academic research. It is subscription-based and the rates are exorbitant compared to what you will pay at a supercomputing center at your academic institution.

The best computing option for this class will be what you wish to use for your own research projects.

If you wish to work on your local computer, here are links to the programs we will be using:

Windows and Mac users: see this note on installing Rtools and Xcode.

You should also register for a free github account. Choose a name that you would be OK with putting in a publication.

Course structure

This workshop starts from the basics and moves through somewhat advanced topics. There will be no formal homework or assignments, but you will want to be comfortable with the topics previously presented by the time the next class arrives. It will help to read the course material in advance. If there are things that don't make sense or are causing you trouble at first, I encourage you to try to figure them out using Google and Stack Overflow. This is the best way to learn. Your questions or problems will have been encountered before. Try running the code in the class notes or R script as we go through the lecture. Then expand your horizon and work on your own data.

Although I will try to address all conceptual questions, we will be unable troubleshoot individual technical issues during class. If you have questions, comments or problems getting things to work, and we don't get to them by the end of class, you can post these issues here. You probably aren't the only person with that question/problem, so posting them in this forum will benefit others.

Each lesson will be structured in the following way:

  • 5 min for people to log in and enter any pre-existing questions they have in the Q&A. These can be questions from the prior week or questions about the current day's material. I will cover what I can during the lecture period.
  • 45 min for me to demonstrate the concepts in the day's lecture
  • 10 min for discussion and additional Q&A that arise.

Follow along with the workshop project on github

2023 Curriculum

This course covers a relatively wide range of topics which may be intimidating for new R users. Don't worry if you don't get it all the first time through. The lectures will be recorded and the code and notes will be published for your reference, so you can go back and review what you may have missed.

The first three lectures will present some basics in using R. The last three will be more advanced.

Even intermediate-level users with some pre-existing experience using R will likely learn some helpful information in the early lectures.

Week 3: More advanced concepts in R - April 26, 10-11 AM ET

  • Thanks for attending
  • I appreciate feedback on things that are or are not working.
  • Your comments will help improve the course for next year
  • Please email to [email protected]

datascience.curriculum's People

Contributors

blaserlab avatar

Stargazers

 avatar

Watchers

 avatar

datascience.curriculum's Issues

error when starting R in a new project if you didn't set up git

This happens because the startup script for the project template includes a function to check which git branch you are on. (Git branch is important if you are collaborating via git but less so if you are just using it to track your own work.)

This error can be fixed by running the git setup commands on git_commands.R . You only have to run these once.

If you don't want to use git, you can also just ignore the error.

I will leave this issue open and try to discuss at the beginning of the next class.

Default branch error

I got behind on classes, just finishing up lesson one. I was getting an error and tried to figure it out on my own but I am very lost now..

I thought I had gotten through through adding the git pane and posting the repository to github but some things didn't look right. My console was saying [master] before each line of code. Also, my repository was appearing on github but it was called baseproject, unlike yours that said rclass_project_2023. After this I realized I was working in the wrong project. I deleted the base project repository from the github. I went into the rclass_project_2023 on Rstudio and repeated the initiaization steps. Now I am getting this errror when i try to run the initialize github command.

usethis::use_github(private = FALSE)
Error in git_default_branch(): ! Can't determine the local repo's default branch. Run rlang::last_trace() to see where the error occurred.
rlang::last_trace()
<error/error_default_branch>
Error in git_default_branch():
! Can't determine the local repo's
default branch.


Backtrace:

  1. └─usethis::use_github(private = FALSE)
  2. └─usethis::git_default_branch()

Problems running git commands for the first time

usethis::use_readme_md(open = FALSE)
√ Writing 'README.md'
usethis::use_git()
√ Initialising Git repo
There are 8 uncommitted files:

  • '.gitignore'
  • '.Rbuildignore'
  • '.Rprofile'
  • 'LICENSE.md'
  • 'R/'
  • 'rclass_example.Rproj'
  • 'README.md'
  • 'renv/'
    Is it ok to commit them?

1: Yes
2: No
3: Absolutely not

Selection: 1
√ Adding files
√ Making a commit with message 'Initial commit'
Error in libgit2::git_signature_default :
config value 'user.name' was not found

Error installing 'blaseRtemplates'

Error installing 'blaseRtemplates' despite latest version of R and RStudio

renv::install("blaserlab/blaseRtemplates")
Retrieving 'https://api.github.com/repos/blaserlab/blaseRtemplates/tarball/22431320e2c1ab2f030c14075b8721d96068c789' ...
OK [downloaded 19.6 Kb in 1.1 secs]
Installing prompt [1.0.1] ...
OK [linked cache]
Installing drat [0.2.2] ...
OK [linked cache]
Installing pkgbuild [1.3.1] ...
OK [linked cache]
Installing pkgload [1.2.4] ...
OK [linked cache]
Installing sessioninfo [1.2.2] ...
OK [linked cache]
Installing xopen [1.0.0] ...
OK [linked cache]
Installing rcmdcheck [1.4.0] ...
OK [linked cache]
Installing remotes [2.4.2] ...
OK [linked cache]
Installing brew [1.0-7] ...
OK [linked cache]
Installing commonmark [1.8.0] ...
OK [linked cache]
Installing roxygen2 [7.1.2] ...
OK [linked cache]
Installing rversions [2.1.1] ...
OK [linked cache]
Installing brio [1.1.3] ...
OK [linked cache]
Installing praise [1.0.0] ...
OK [linked cache]
Installing diffobj [0.3.5] ...
OK [linked cache]
Installing waldo [0.4.0] ...
OK [linked cache]
Installing testthat [3.1.3] ...
OK [linked cache]
Installing devtools [2.4.3] ...
OK [linked cache]
Installing pak [0.2.1] ...
OK [linked cache]
Installing blaseRtemplates [0.0.0.9088] ...
FAILED
Error installing package 'blaseRtemplates':
===========================================

installing source package 'blaseRtemplates' ...
** using staged installation
** R
Error in parse(outFile) :
C:/Users/katha/AppData/Local/Temp/RtmpSOimFS/renv-package-new-7a4c45c44d84/blaseRtemplates/R/bb_renv_datapkg.R:28:72: unexpected '>'
27: } else {
28: latest_version <- file.info(list.files(path, full.names = T)) |>
_______________________________________________________________^
ERROR: unable to collate and parse R files for package 'blaseRtemplates'
removing 'C:/Users/katha/R_Workshop/renv/staging/5/blaseRtemplates'
Error: install of package 'blaseRtemplates' failed [error code 1]

Problem with establish_new_bt

Hi Brad,

when I tried to run the line
establish_new_bt(cache_path = "<some_directory>/r_4_2_cache", project_path = "<some_directory>/projects")

I got the following error:
Error in establish_new_bt(cache_path = "<some_directory>/r_4_2_cache", :
could not find function "establish_new_bt"

How should I proceed? I have not tried any of the subsequent code yet.

thanks!

Polina

Trouble loading 10X data into R using load_process.R

A person in the class commented:

"I tried run through codes we covered today for single cell sequencing data analysis. I am wondering how we get the processed file for R like "vignette_cds" shown today from results of 10x Cloud analysis. I uploaded fastq.gz files from Chronium Control and Chronium X to 10x Cloud analysis and created new analysis. The results include multiple files, such as "Feature / cell matrix (filtered)" and "Feature / cell matrix HDF5 (filtered)."

error in git commit

Hi Brad,

I practiced with the iris dataset, but I keep receiving the following error when trying to use the git_commit command. The changes that I am trying to commit are not appearing on git.

Error in gert::git_commit("lecture 2 practice with iris dataset") :
No staged files to commit. Run git_add() to select files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.