Spring 2023 Tidyverse create and extend assignments
Initial Description and Links:
In this assignment we will get to practice collaborating around a code project with GitHub. We will be practicing our knowledge of TidyVerse functions by creating vignette examples of the packages. I am using a birth dataset from fivethirtyeight.com. This dataset contains U.S. births data for 1994 - 2003 which, is provided by the Centers for Disease Control and Prevention’s (CDC’s) National Center for Health Statistics (NCNS).
In the extension part of this assignment I have chose Alic's work and used the TidyVerse package, specifically dplyr package to demonstrate it's capabilities.I have used dplyr to manipulate the dataset by using filter(), summarize(), sum(), and mean() functions, combine with group_by() which, allowed us to perform our operation “by group”.
Initial Description and Link:
For this assignment, we'll be practicing our knowledge of Tidyverse functions by creating vignette examples of the packages that make up Tidyverse. In this project, my goal is to create a programming example or “vignette” that showcases the capabilities of a TidyVerse package, along with a dataset from either fivethirtyeight.com or Kaggle. The aim of this example is to demonstrate how to effectively use the selected TidyVerse package to manipulate, analyze, and visualize the selected dataset.
Initial Description and Link:
For this assignment, we'll be practicing our knowledge of Tidyverse functions by creating vignette examples of the packages that make up Tidyverse. In my case, I wanted to attempt going over the forcats package which focuses on manipulating factor elements in a dataframe, as I have no experience with using it at this point.
=======
=======
For my extend I have extended Kory Martin's dplyr create assignment. This is located here.
=======
Initial Description and Link:
I've chosen ggplot2
as my tidyverse package to showcase and worked with a dataset from Kaggle showing the number of internet users for various countries between the years 1980 and 2020.
=======
=======
For my extend, I've chosen to add upon Farhana's implementation of dplyr
. This is located here.
# Glen Davis A vignette of example use cases for the purrr library within the tidyverse.
=======
For this create assignment, I used the following packages and functions to analyze college major dataset from FiveThirtyEight.
Package | Function |
---|---|
readr |
read_csv |
dplyr |
glimpse() group_by() summarise() mutate() |
ggplot2 |
ggplot() geom_bar() scale_x_continuous() scale_y_continuous() labs() xlab() ylab() ggtitle() theme() coord_flip() |
Links:
- FiveThirtyEight
- Github
- Rpub =======
Tidyverse SELECT
I worked using the lubridate package within the Tidyverse ensemble. With this, I created examples exploring NYC Filming Permits data.
======= Tidyverse EXTEND
To extend an example, I used Keith's fuzzyjoin package to work on MTA subway locations and NYC public hospitals
=======
TIDYVERSE CREATE
For this assignment dplyr was utilized to conduct a superficial analysis of the 'Music Dataset : 1950 to 2019' which provides a list of songs 'from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc.' More specifically, dplyr was used to analyze aspects of the 'sadness' variable to demonstrate the main functions of the dplyr package. The final product is a playlist of songs between the years 1950-2019 which includes the top one 'saddest' song from each year.
TIDYVERSE EXTEND
The overarching purpose of this assignment was the utilization of github as a collaborative coding tool to explore push, pull, clone and forking capabilities. In this assignment a github repository was cloned, another student's vignette .rmd file modified and then the .rmd file was pulled back to the original github repository to demonstrate github's collaboration capabilities.
The vignette which was modified was created by Jlok17. The vignette was modified to improve ggplot readability. More specifically the plot was reordered and observation value labels were introduced to a scatter plot.
A vignette of example use cases for the purrr library within the tidyverse.
Readr Vignette -
Yahoo Finance Kaggle Dataset:
Github:
Rpubs:
Purrr Extension -
Github:
gdd - extending Mo's tidyverse create submission - starting lines for changes/comments - line 29: combined your string replacements and class coercion into one line line 35: reordered/simplified your group_by/summarize workflow line 66: adjusted your bar plot so that the x-/y-values are in more standard positions (i.e. the x is a category, and the y is numeric), your bars are sorted, and then the coords are flipped so you still achieve what you wanted visually: a horizontal bar plot. This is better than setting the y-value as a category and the x-value as numeric and not doing the coord flip, as it's easy to get confused when you do it that way. Also updated your Amounts since you wanted them to represent millions of dollars, not dollars.
Initial Description and Link:
For this assignment, I decided to use ggplot2
and associated map package to present a visual presentation for data analysis.
Inital Description:
The purpose of this mardown is to provide an introduction to the following 3 packages: Forcats, Dplyr, and GGplot. This vignette shows how using these 3 packages from the larger tidyverse package can help the user enhance data visuals (using GGPlot).
The objective of this assignment was twofold (1) to practice collaborating around a code project with GitHub and (2) to use a capability of tidyverse and demonstrate it with a vignette. The gitHub repository the code was submitted to with a pull request is https://github.com/acatlin/SPRING2023TIDYVERSE.
The dataset I chose to work with is data that I obtained from working with the Franklin Community Center. The Franklin Community Center is a nonprofit organization that aims to help families and individuals in Saratoga County. They have been in operation for 40 years and their Food Pantry has been operational since 2018. In 2019, the Food Pantry began using the Oasis database to manage their cases. Each family or individual is assigned a case number, and every time a person from the case comes in to receive a service it is documented. I worked with the Oasis team to understand how to extract data from their database. With the data I extracted, my goal is to visualizations showing what parts of NY the food bank services are going to.
The tidyverse capabilities that I wanted to demonstrate using the dataset are extensions of ggplot2
. The Simple Features for R, or sf
package, can be used in conjunction with ggplot2
in tidyverse to create maps. Additionally the treemapify
can be used with ggplot2
to make treemaps.
For the tidyverse extend, I chose to work with Taha's code and to demonstrate more of the functionality of the forcats package.
This Vignette is eying at a tidyverse package ggplot2. This purpose of this vignette is to explain how basics of ggplot2 works and how can we make effective graphs. A random data set in the field to healthcare is being picked from Kaggle to plot data using ggplot2
In the tidyverse extend assignment I have extended Alex Khaykin's vignette. Below is the github link for that entension:
Initial Description and Link:
In this assignment we will get to practice collaborating around a code project with GitHub. We will create and example using one or more TidyVerse packages and demonstrate how to use the capabilities. I will use a birth dataset from 'fivethirtyeight.com'.
TidyVerse Create
TidyVerse Extend
Initial Description and Link:
The dataset I used was obtained from Kaggle. It contained the amount of political donations given by American sports owners to political campaigns and Political Action Committee organizations. Using dyplyr, stringr, and ggplot2 from tidyverse, I explored various questions from the dataset.
=======
Initial Description and Link:
For this assignment, I choose the dplyr library in Tidyverse to show how to work with a dataframe that shows Netflix TV Shows and Movies dataset, which was pulled from Kaggle.
For the extend portion of this assignment, I looked at the code originally created by classmate Coco Donavon, here.
=======
This vignette demonstrates some of the capabilities of the tidyr package from the tidyverse suite. It also utilizes dplyr and ggplot2 functions.
The data set used was from FiveThirtyEight.com and it focused on Elo ratings and other metrics for NBA basketball teams.
Links:
I chose to extend John Cruz's vignette on the lubridate package by focusing on the as_date()
function.
Links:
=======
- Extended Alex Khaykin's analysis of Congressmembers' ages.
Initial Description and Link:
For this assignment, I will be creating a programming sample vignette to demonstrate the use of the tidyr package in the tidyverse package. I will be working with the “Video Game Sales” (https://www.kaggle.com/datasets/gregorut/videogamesales) dataset from Kaggle. The dataset was generated from a scrape of vgchartz.com and contains the sales of video games that sold greater than 100,000 copies from 1980 to 2020.
This is my extension to Alex Khaykin's vignette on the ggplot2
package in the tidyverse. His "Create" assignment looked at key plots in ggplot2
using the 'congress_age' dataset from fivethirtyeight. So far, he has demonstrated how to create a bar plot, boxplot, violin plot, and a scatterplot. I will expand on this by creating a density plot and histogram as well as showing useful components in the ggplot2 package to improve data visualization.
This vignette demonstrates how to use the str_replace_all function within the tidyverse's stringr package in the context of tabular data. The data is a compilation of 2,498 articles about data science, pulled from the data science site Kaggle. See link here:
Initial Description and Links:
For this assignment, we explore TidyVerse and how to use some of its features. In my case, we use ggplot2 to explore some recent college grads and unemployment attributed to the field of study.
=======
This vignette will introduce the fuzzyjoin package, which enables joining of two datasets based on imperfect matches. This package is very helpful for combining data without unique keys.
Initial Description and Links:
In this assignment we will get to practice collaborating around a code project with GitHub. We will create, and example using one or more TidyVerse packages and demonstrate how to use the capabilities.
We will use the Bob Ross Dataset from "Fivethirtyeight.com". Within this data set, there are the different elements of Bob Ross Painting's/Work, here we will use the Tidyverse Package to show general trends and an analysis of these different elements.
=======
Using a dataset containing NCAA Women's Basketball rosters for Division I, I performed a basic analysis of from the rosters using readr, dplyr and ggplot2.