This repository contains data science projects from the "Data Scientist with Python" course on DataCamp.
-
Investigating Netflix Movies ๐ฟ
-
Apply foundational Python skills to answer a real-world question. The aim is to discover if Netflix's movies are getting shorter over time using everything from lists and loops to pandas and matplotlib. The user will also gain experience in an essential data science skill โ exploratory data analysis.
pandas
matplotlib
-
-
Exploring NYC Public School Test Result Scores ๐
-
Use data manipulation and summary statistics to analyze test scores across New York City's public schools. It involves utilizing standardized test performance data from NYC's public schools to identify the schools with top math results, analyze performance variations by borough, and determine the top ten performing schools in the city.
pandas
numpy
||pandas - .groupby()
-
-
Visualizing the History of Nobel Prize Winners ๐๏ธ
-
The Nobel Prize is awarded yearly to scientists and scholars in chemistry, literature, physics, medicine, economics, and peace, with the first prize awarded in 1901. Are there any biases in the way the honors are awarded? Use your data manipulation and visualization skills to explore the history of this coveted prize.
pandas
numpy
seaborn
||pandas - .groupby()|.value_counts()
seaborn - .relplot()
-
-
Analyzing Crime in Los Angeles ๐ฎ
-
Find out when and where crime is most likely to occur, along with the types of crimes commonly committed in LA. Analyze crime data to guide the Los Angeles Police Department on how they should allocate resources to protect the people of their city.
pandas
numpy
matplotlib
seaborn
||pandas - .cut()
seaborn - .countlot()|.barplot()
-
-
Customer Analytics - Preparing Data for Modeling ๐
-
Apply your knowledge of data types and categorical data to prepare a big dataset for modeling. Being able to create predictive models is very cool, but translating fancy models into real business value is a major challenge if the training data isn't stored efficiently. You'll convert data types, create ordered categories, and filter ordered categorical data so the data is ready for modeling.
pandas
||pandas - .info()|.memory_usage()|.astype()
-
-
Exploring Airbnb Market Trends ๐
-
Apply your importing and cleaning data and data manipulation skills to explore New York City Airbnb data. New York City has a variety of Airbnb listings to meet the high demand for temporary lodging for travelers, with several different price levels, room types, and locations. In this project, you'll be able to practice your skills in importing and cleaning data and data manipulation to help you report insights to a real estate start-up.
pandas
numpy
||pandas - .merge()|.to_datetime()|.str.lower()|.str.replace()
-
-
Modeling Car Insurance Claim Outcomes ๐
-
Clean customer data and use logistic regression to predict whether people will make a claim on their car insurance.
pandas
numpy
statsmodels
||statsmodels - logit()
accuracy
-
-
Hypothesis Testing with Men's and Women's Soccer Matches โฝ
-
Perform a hypothesis test to determine if more goals are scored in women's soccer matches than men's.
pandas
matplotlib
pingouin
||hypothesis testing
-
-
Predictive Modeling for Agriculture ๐
-
Dive into agriculture using supervised machine learning and feature selection to aid farmers in crop cultivation and solve real-world problems. A farmer reached out to you as a machine learning expert seeking help to select the best crop for his field. Due to budget constraints, the farmer explained that he could only afford to measure one out of the four essential soil measures. The expert realized that this is a classic feature selection problem, where the objective is to pick the most important feature that could help predict the crop accurately.
pandas
matplotlib
sklearn
||supervised machine learning
sklearn - LogisticRegression
-
-
Clustering Antartic Penguin Species ๐ง
-
Unsupervised learning is a critical task in machine learning, and it plays a significant role in this project. The objective is to delve into the information about penguins by utilizing unsupervised learning techniques on a thoughtfully curated dataset. By conducting thorough data exploration, extracting meaningful features, and employing advanced algorithms, this project aims to uncover concealed patterns, clusters, and relationships that exist within the dataset.
pandas
matplotlib
sklearn
||unsupervised machine learning
sklearn - StandardScaler|PCA|KMeans
-
-
Predicting Movie Rental Durations ๐
-
Build a regression model for a DVD rental firm to predict rental duration. Evaluate models to recommend the best one.
pandas
matplotlib
sklearn
||supervised machine learning
sklearn - Lasso|OLS|RandomForest
-