A recent graduate of Bellevue University's Data Science Graduate progam, I am eager to show you what I have learned and what I am capable of. Here you will find a collection of some of my favorite data science projects.
This project goes through various visual mediums such as a Tableau dashboard, PowerPoint presentation, blog post, infographic, and an informative video. The data analysis is presented in various ways to show different visual analysis of data and different presentation platforms depending on the audience.
This project uses image classification on normal, benign, and malignant breast ultrasound images to build a model that can detect whether a specific specimen contains cancer or not. The business problem would be that there needs to be a way to detect whether a person has breast cancer quickly and in an automated way to test for cancer based off someone’s ultrasound images. This could therefore speed up the process of diagnosing patients and getting them the treatment they need. Some research questions to potentially answer include the following: With what accuracy and speed can we determine if an ultrasound image contains cancerous masses. What would be the impact if an image is misclassified? What is it that determines whether an image has cancerous masses?
This project takes data from the status of people who were vaccinated, partially vaccinated, and not vaccinated for COVID 19 and provides visual and statistical analysis using R. The main business question was to see how the US compared to other countries when it came to vaccination numbers. Some analysis techniques used were CDF and PMF graphs, histograms, pareto and lognormal distribution curves, scatterplots, and correlation charts. There was also a regression model built to see if predicting where the vaccination rates will go was also conducted and presented.
This project used Trip Advisor text reviews and ratings of various Disney parks around the world and used graph analysis to look at the rankings of each resort to see which one had a higher satisfaction rating. There was also feature extraction performed on the variables, and then text analysis was then used to find those with positive ratings based off their comments. The text classification model was used to classify each review as positive or negative overall to see which parks had better Guest satisfaction.
This project will focus on a real-life application of data science for my current job in efforts to report the work that has been done over the last fiscal year, describe how the work has changed, and what work is left to be done. A new digital platform was introduced in 2019, and so it is necessary to show the impacts of that work on the current fiscal year as well. The main business problem is that we need to find a way to show executives that the work we have been doing over the past year is important, continues to grow, and that the need for permanent positions and temporary positions are still relevant. This analysis and presentation are necessary to maintain our current funding and be able to justify why we may need more funding in the future. Some research questions include: How does the new digital platform impact the work? How much of the original work is still being done? How efficient is this new platform? Does the efficiency mean less or more positions are needed to get the work done?
This project studies the affects that alcohol consumption has on high school performance using Python and R. There was a wide amount of categorical data from two different sets of classes, and multiple visual and statistical analysis processes were performed on the data to see what else affects high school performance other than alcohol. A random forest regression and classifier model were created to generate information on what could predict future high school performance.
Using ggplot and other techniques in R, this project takes three different data sets with different movie ratings and performs EDA using visualizations and statistical analysis. This project was used to discover what types of films are more successful, which directors have the highest rated movie, and what production companies make the best movies. IMDb and Rotten Tomatoes ratings were as a guideline to gauge what makes a movie a success. Streaming data was also analyzed to see what platforms tended to have higher rated movies as well as what genres each streaming service tended to favor.
This project will focus on video game sales by performing an exploratory data analysis along with predictive modeling to gather information for future game designs. Questions answered in the analysis include What makes a great video game? What console is the higher selling console or platform? Which games had the highest sales? Can we predict what games might do well in the future? Did the 2020 pandemic affect sales positively or negatively?
This project uses Australian weather data to demonstrate various data extracting and cleaning techniques. Weather data was pulled from a local csv file, web data was pulled off a Wikipedia page, and a website using an API call was also used to extract weather data. The various forms of data were brought into Python and cleaned for data analysis using various cleaning methods. All the data was then merged into one data set using SQLLite.
This project is a simple but effective application that demonstrates Python coding expertise by pulling weather information off of the internet and displaying the results. A user could input any location using either a zip code or city and would generate various weather data using an API key. An error message will also display if an incorrect or invalid location is entered.