Light

d-ev-craig / spring2023tidyverse Goto Github PK

View Code? Open in Web Editor NEW

This project forked from abnormalpotassium/spring2023tidyverse

0.0 0.0 0.0 9.85 MB

Spring 2023 Tidyverse create and extend assignments

HTML 100.00%

spring2023tidyverse's Introduction

SPRING2023TIDYVERSE

Spring 2023 Tidyverse create and extend assignments

Farhana Akther

Initial Description and Links:

In this assignment we will get to practice collaborating around a code project with GitHub. We will be practicing our knowledge of TidyVerse functions by creating vignette examples of the packages. I am using a birth dataset from fivethirtyeight.com. This dataset contains U.S. births data for 1994 - 2003 which, is provided by the Centers for Disease Control and Prevention’s (CDC’s) National Center for Health Statistics (NCNS).

fivethirtyeight

TIDYVERSE EXTEND:

In the extension part of this assignment I have chose Alic's work and used the TidyVerse package, specifically dplyr package to demonstrate it's capabilities.I have used dplyr to manipulate the dataset by using filter(), summarize(), sum(), and mean() functions, combine with group_by() which, allowed us to perform our operation “by group”.

Waheeb Algabri

Initial Description and Link:

For this assignment, we'll be practicing our knowledge of Tidyverse functions by creating vignette examples of the packages that make up Tidyverse. In this project, my goal is to create a programming example or “vignette” that showcases the capabilities of a TidyVerse package, along with a dataset from either fivethirtyeight.com or Kaggle. The aim of this example is to demonstrate how to effectively use the selected TidyVerse package to manipulate, analyze, and visualize the selected dataset.

Rpubs

Taha A

Initial Description and Link:

For this assignment, we'll be practicing our knowledge of Tidyverse functions by creating vignette examples of the packages that make up Tidyverse. In my case, I wanted to attempt going over the forcats package which focuses on manipulating factor elements in a dataframe, as I have no experience with using it at this point.

=======

=======

For my extend I have extended Kory Martin's dplyr create assignment. This is located here.

=======

Alice D

Initial Description and Link:

I've chosen ggplot2 as my tidyverse package to showcase and worked with a dataset from Kaggle showing the number of internet users for various countries between the years 1980 and 2020.

=======

=======

For my extend, I've chosen to add upon Farhana's implementation of dplyr. This is located here.

# Glen Davis A vignette of example use cases for the purrr library within the tidyverse.

=======

Susanna W

For this create assignment, I used the following packages and functions to analyze college major dataset from FiveThirtyEight.

Package	Function
`readr`	`read_csv`
`dplyr`	`glimpse()` `group_by()` `summarise()` `mutate()`
`ggplot2`	`ggplot()` `geom_bar()` `scale_x_continuous()` `scale_y_continuous()` `labs()` `xlab()` `ylab()` `ggtitle()` `theme()` `coord_flip()`

Links:

John Cruz

Tidyverse SELECT

I worked using the lubridate package within the Tidyverse ensemble. With this, I created examples exploring NYC Filming Permits data.

Extension by Jacob Silver

======= Tidyverse EXTEND

To extend an example, I used Keith's fuzzyjoin package to work on MTA subway locations and NYC public hospitals

=======

Gregg Maloy

TIDYVERSE CREATE

For this assignment dplyr was utilized to conduct a superficial analysis of the 'Music Dataset : 1950 to 2019' which provides a list of songs 'from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc.' More specifically, dplyr was used to analyze aspects of the 'sadness' variable to demonstrate the main functions of the dplyr package. The final product is a playlist of songs between the years 1950-2019 which includes the top one 'saddest' song from each year.

TIDYVERSE EXTEND

The overarching purpose of this assignment was the utilization of github as a collaborative coding tool to explore push, pull, clone and forking capabilities. In this assignment a github repository was cloned, another student's vignette .rmd file modified and then the .rmd file was pulled back to the original github repository to demonstrate github's collaboration capabilities.

The vignette which was modified was created by Jlok17. The vignette was modified to improve ggplot readability. More specifically the plot was reordered and observation value labels were introduced to a scatter plot.

=======

Glen Davis: Create:

A vignette of example use cases for the purrr library within the tidyverse.

Daniel Craig

Readr Vignette -

Yahoo Finance Kaggle Dataset:

Github:

Rpubs:

Daniel Craig

Purrr Extension -

Github:

<a href = https://github.com/d-ev-craig/DATA607/blob/main/TIDYVERSE%20Create/purrr%20Vignette%20Extension/purrr_Vignette_ext_dcraig.Rmd

Rpubs:
<a href = https://rpubs.com/devcraig/DATA607purrrVig

Glen Davis: Extend:

gdd - extending Mo's tidyverse create submission - starting lines for changes/comments - line 29: combined your string replacements and class coercion into one line line 35: reordered/simplified your group_by/summarize workflow line 66: adjusted your bar plot so that the x-/y-values are in more standard positions (i.e. the x is a category, and the y is numeric), your bars are sorted, and then the coords are flipped so you still achieve what you wanted visually: a horizontal bar plot. This is better than setting the y-value as a category and the x-value as numeric and not doing the coord flip, as it's easy to get confused when you do it that way. Also updated your Amounts since you wanted them to represent millions of dollars, not dollars.

Eddie Xu

Initial Description and Link:

For this assignment, I decided to use ggplot2 and associated map package to present a visual presentation for data analysis.

Rpubs for Extend

=======

Gabriel Castellanos

Inital Description:

The purpose of this mardown is to provide an introduction to the following 3 packages: Forcats, Dplyr, and GGplot. This vignette shows how using these 3 packages from the larger tidyverse package can help the user enhance data visuals (using GGPlot).

Kayleah Griffen

The objective of this assignment was twofold (1) to practice collaborating around a code project with GitHub and (2) to use a capability of tidyverse and demonstrate it with a vignette. The gitHub repository the code was submitted to with a pull request is https://github.com/acatlin/SPRING2023TIDYVERSE.

The dataset I chose to work with is data that I obtained from working with the Franklin Community Center. The Franklin Community Center is a nonprofit organization that aims to help families and individuals in Saratoga County. They have been in operation for 40 years and their Food Pantry has been operational since 2018. In 2019, the Food Pantry began using the Oasis database to manage their cases. Each family or individual is assigned a case number, and every time a person from the case comes in to receive a service it is documented. I worked with the Oasis team to understand how to extract data from their database. With the data I extracted, my goal is to visualizations showing what parts of NY the food bank services are going to.

The tidyverse capabilities that I wanted to demonstrate using the dataset are extensions of ggplot2. The Simple Features for R, or sf package, can be used in conjunction with ggplot2 in tidyverse to create maps. Additionally the treemapify can be used with ggplot2 to make treemaps.

Kayleah Griffen Tidyverse Extend

For the tidyverse extend, I chose to work with Taha's code and to demonstrate more of the functionality of the forcats package.

=======

Umer Farooq

Create

This Vignette is eying at a tidyverse package ggplot2. This purpose of this vignette is to explain how basics of ggplot2 works and how can we make effective graphs. A random data set in the field to healthcare is being picked from Kaggle to plot data using ggplot2

Extend

In the tidyverse extend assignment I have extended Alex Khaykin's vignette. Below is the github link for that entension:

=======

Alex K

Initial Description and Link:

In this assignment we will get to practice collaborating around a code project with GitHub. We will create and example using one or more TidyVerse packages and demonstrate how to use the capabilities. I will use a birth dataset from 'fivethirtyeight.com'.

=======

Miguel Gomez

TidyVerse Create

fivethirtyeight

TidyVerse Extend

Mohamed Hassan

Initial Description and Link:

The dataset I used was obtained from Kaggle. It contained the amount of political donations given by American sports owners to political campaigns and Political Action Committee organizations. Using dyplyr, stringr, and ggplot2 from tidyverse, I explored various questions from the dataset.

=======

=======

Kory Martin

Tidyverse Create:

Initial Description and Link:

For this assignment, I choose the dplyr library in Tidyverse to show how to work with a dataframe that shows Netflix TV Shows and Movies dataset, which was pulled from Kaggle.

GitHub

Rpubs
=======
Tidverse Extend:

For the extend portion of this assignment, I looked at the code originally created by classmate Coco Donavon, here.

=======

GitHub

Rpubs
=======

Shoshana Farber

Tidyverse CREATE

This vignette demonstrates some of the capabilities of the tidyr package from the tidyverse suite. It also utilizes dplyr and ggplot2 functions.

The data set used was from FiveThirtyEight.com and it focused on Elo ratings and other metrics for NBA basketball teams.

Links:

Tidyverse EXTEND

I chose to extend John Cruz's vignette on the lubridate package by focusing on the as_date() function.

Links:

GitHub
Rpubs

=======

Ross Boehme

Ross Create

FiveThirtyEight Article

Ross Extend

Extended Alex Khaykin's analysis of Congressmembers' ages.

=======

Jian Quan Chen

Create

Initial Description and Link:

For this assignment, I will be creating a programming sample vignette to demonstrate the use of the tidyr package in the tidyverse package. I will be working with the “Video Game Sales” (https://www.kaggle.com/datasets/gregorut/videogamesales) dataset from Kaggle. The dataset was generated from a scrape of vgchartz.com and contains the sales of video games that sold greater than 100,000 copies from 1980 to 2020.

Extend

This is my extension to Alex Khaykin's vignette on the ggplot2 package in the tidyverse. His "Create" assignment looked at key plots in ggplot2 using the 'congress_age' dataset from fivethirtyeight. So far, he has demonstrated how to create a bar plot, boxplot, violin plot, and a scatterplot. I will expand on this by creating a density plot and histogram as well as showing useful components in the ggplot2 package to improve data visualization.

=======

Jacob Silver

This vignette demonstrates how to use the str_replace_all function within the tidyverse's stringr package in the context of tabular data. The data is a compilation of 2,498 articles about data science, pulled from the data science site Kaggle. See link here:

https://www.kaggle.com/datasets/arnabchaki/medium-articles-dataset?resource=download

Joe Garcia

Initial Description and Links:

For this assignment, we explore TidyVerse and how to use some of its features. In my case, we use ggplot2 to explore some recent college grads and unemployment attributed to the field of study.

FiveThirtyEight

TidyVerse Extend to Susanna's Github

=======

=======

Keith Colella

This vignette will introduce the fuzzyjoin package, which enables joining of two datasets based on imperfect matches. This package is very helpful for combining data without unique keys.

Joshua Lok

Initial Description and Links:

In this assignment we will get to practice collaborating around a code project with GitHub. We will create, and example using one or more TidyVerse packages and demonstrate how to use the capabilities.

We will use the Bob Ross Dataset from "Fivethirtyeight.com". Within this data set, there are the different elements of Bob Ross Painting's/Work, here we will use the Tidyverse Package to show general trends and an analysis of these different elements.

fivethirtyeight

=======

Coco Donovan

Using a dataset containing NCAA Women's Basketball rosters for Division I, I performed a basic analysis of from the rosters using readr, dplyr and ggplot2.

data = 'https://raw.githubusercontent.com/Sports-Roster-Data/womens-college-basketball/main/wbb_rosters_2022_23.csv'

spring2023tidyverse's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.