Code Monkey home page Code Monkey logo

bmin503's Introduction

Data Science for Biomedical Informatics (BMIN503/EPID600)

This repository contains files used for a University of Pennsylvania course taught Fall 2020. The main course website contains further details for students enrolled in the course.

Course Description

Data science refers broadly to using statistics and informatics techniques to gain insights from large datasets. Biomedical informatics refers to a range of disciplines that use computational approaches to analyze biomedical data to answer pre-specified questions as well as to discover novel hypotheses. In this course, we will use R and other freely available software to learn fundamental data science applied to a range of biomedical informatics topics, including those making use of health and omics data. After completing this course, students will:

  • Be able to retrieve and clean data, perform exploratory analyses, build models to answer scientific questions, and present visually appealing results to accompany data analyses.
  • Be familiar with various biomedical data types and resources related to them.
  • Know how to create reproducible and easily shareable results with R and GitHub.

Course Director:
Blanca E Himes, PhD
Associate Professor of Informatics

Guest Lecturers:
Elizabeth Grice, PhD
John Holmes, PhD
Jesse Yenchih Hsu, PhD
Mengyuan Kan, PhD
Ari Klein, PhD
William La Cava, PhD
Erin Schnellinger, MS
Ryan Urbanowicz, PhD
Sherrie Xie, BS

TA:
Alexa Woodward, MS

Expectations

You are expected to view all sessions of the course, participate in virtual class discussions, and complete required exercises and the class project. This course requires use of a computer, which you must have to fully participate in lectures and online activities. You must be familiar with this computer and able to install free programs onto it.

Grading: The course is graded on a letter grade basis, according to the following proportions:

  • 40% assignments (6 total)
  • 40% biomedical data science project
  • 20% participation in class and lab sessions

Format

The course will be held virtually. Asynchronous lectures and practicum materials to work through computational exercises will be available weekly. Six assignments will be due throughout the semester. A final project requiring a substantial amount of work and creativity will be due at the end of the semester. This project is in lieu of a final exam. Students will be encouraged to work independently and seek help as needed. Synchronous meetings will occur most Tuesdays and Thursdays 1:30-2:30pm to discuss material covered in recorded lectures and answer student questions. Check Canvas frequently to find out the latest announcements.

Assignments

Due dates for the assignments: 9/10/20, 9/24/20, 10/8/20, 10/22/20, 11/5/20, 11/19/20.

Biomedical Data Science Project

The final project will answer a question selected by each student using biomedical data and some of the tools presented during the course. After students choose the topic to address on their own, each will identify three faculty/staff scientists/postdocs from different departments/fields to get feedback and help define a specific novel and interdisciplinary question. Although use of publicly available data for the project is encouraged, students may use an appropriate private data source. Students will work on these projects throughout the semester with final project reports due on 12/11/20. Grading will be based on three project components:

  1. An html document derived from R markdown that describes the question, source of data, analysis, and results
  2. A GitHub repository that contains an organized project
  3. An online presentation describing the work to classmates

Textbooks

There are many free online resources to learn R. The following two textbooks are suggested but not required for those students who prefer to have a printed reference:

  • Lander JP, “R For Everyone: Advanced Analytics and Graphics” Addison-Wesley Professional. (2014)
  • Wickham H and Grolemund G, “R for Data Science” O’Reilly (2016)

Prerequisites

Familiarity with basic statistical (e.g., EPID 526/7 or other first-year graduate level stats course) concepts is expected, as this course will not cover basic concepts in depth. A background in biology and computing would be helpful, but no formal requirements will be enforced.

Academic Honesty

All work submitted for credit is expected to be your own work. In the preparation of all papers and other written work, you should always take great care to distinguish your own ideas and knowledge from information derived from other sources. The term “sources” includes not only published primary and secondary material, but also information and opinions gained directly from other people. The responsibility for learning the proper forms of citation lies with you. You must acknowledge any collaboration and its extent in all submitted work. You are expected to follow Penn’s standards of academic integrity as found here.

Students with Disabilities

University of Pennsylvania provides reasonable accommodations to students with disabilities who have self-identified and been approved by the office of Student Disabilities Services (SDS). Please make an appointment to meet with me as soon as possible in order to discuss your accommodations and your needs. If you have not yet contacted SDS, and would like to request accommodations or have questions, you can make an appointment by calling (215) 573-9235. The SDS office is located in the Weingarten Learning Resources Center at Stouffer Commons 3702 Spruce Street, Suite 300. All SDS services are free and confidential. Please visit the SDS website for more details.

============ The titanic3.csv file was obtained here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.