Code Monkey home page Code Monkey logo

Hi there 👋

Hey there 👋, This is Daniel M. Smith

Gmail Badge Linkedin Badge Github Badge Twitter Badge
RESUME Badge

I am a Data Scientist, Data Analyst, Systems Engineer skilled in Python, SQL, Machine Learning, Optimization, and modeling. I harness data visualization techniques to present results to stakeholders in order to tell the story the data is longing to tell. My past work experience as a Java Developer, Linux System Admin and IT Operations Manager has enabled a wider viewpoint. I possess a logical approach and great problem-solving skills, perform well in high-pressure situations, and thrive in a team-oriented system by enabling my teammates. I am interested in both Data Science and Data Analytics and solving problems.

Technical Skills: • Languages: Python, SQL • Predictive Modeling: Linear/Logistic Regression, Classification, Clustering, Decision Tree, Random Forest, Support Vector Machines, K-Nearest Neighbors, • Machine Learning: Deep Learning, Neural Networks, Keras, TensorFlow, Time Series • Databases: MySQL, Oracle, SQLite, MongoDB • Data Visualization: Matplotlib, Seaborn, Tableau • Environments: Google Colab, Jupyter Notebook • Data Science Methods: Gathering, Cleaning, Scrubbing, Exploration, Mining, Modeling, Visualization

Some of my Projects

This project encapsulates using Classification with Machine Learning for modeling 2018 Domestic Airline Flight Delays. We performed Inferential Analysis of 7M+ recs looking at Airlines, Destinations of Flights, Delays, Times, We then reduced the Data set to just the Top 5 Airlines by number of flights. We reduced the number of Origins and Destinations to the top 30 instead of the 358. We also performed Classification Analysis with Machine Learning Algorithms Logistic Regression, Decision Trees, Random Forests, XGBoost With GridSearch narrowing down the most optimal Hyperparameters to predict delayed flights and assess the strength and relationship and importance of the different features and their relation to delayed flight. 10AircraftLate.png

This project implements a Recommendation System for Movies. We performed KFold Cross Validation on the movie ratings with Matrix reduction algorithms and optimized with GridSearch. We were looking for minimal errors choosing RMSE as our main metric and also time it takes to fit the moved as the matric will need to run to re fit after a user updates their ratings. We used a Collaborative Filtering Model Based approach for this first implementation 4_FilmWizard.png

This project encapsulates using multiple regression for modeling home sales data. We performed Inferential Analysis on over 21,000 home sales from Kings County and by removing any data with a outliers which had a z score larger than 3. In a normal distribution 99% or all data falls with a z score of under 3. We also performed a multiple regression analysis which allows us to build a pricing model and assess the strength and relationship and importance of the different features and their relation to an estimate price of a property.

A new feature we created was distance from four major employment locations in Kings County. Using the haversine formlula mentioned in the following blogs as reference: kingsEmployers.png We also created the district feature to divid the county into 10 separate districts based on zipcodes. districts.png

This project analyzes movie data in order to create a portfolio strategy for entrance into the Entertainment industry.

Regression Analysis for Domestic Box Office with the Bass Diffusion Model and Monte Carlo Simulation

Data The Numbers Yearly Box Office revenue 11 years Weekly Box Office Revenue 11 years Distributor, Genre, Source, Creative Type, Inflation Adjusted Domestic Bo IMDB Daily Dumps 8 mil records movies,principals, Actors actresses, Directors

Methods Created Actor Influence-formula Created Director Influence-formula Classified each move in a Franchise or Not Each movie fit to Bass Model for 3 coefficients M (market size, initially set to 1,000,000), p (coefficient of innovation, initially set to 0.003) and q (coefficient of imitation, initially set to 0.5).

Tableau Analysis of Data Science Cohort

CohortView

Some of my Github Stats

SunTzuLombardi

Github stats Top Langs

Daniel M. Smith's Projects

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.