Code Monkey home page Code Monkey logo

panastasiadis / data-mining-operations Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 757 KB

This repository contains three Knime workflows that aim to analyze the Air Traffic Passenger Statistics dataset from the San Francisco International Airport. The workflows include tasks such as classification comparison, regression analysis, and outlier detection using various machine learning techniques.

License: MIT License

ai bigdata dataanalysis datamining datapreprocessing datascience datavisualization knime machinelearning outlierdetection

data-mining-operations's Introduction

Data Mining of Air Traffic Passenger Statistics using Knime

This repository contains three Knime workflows that aim to analyze the Air Traffic Passenger Statistics dataset from the San Francisco International Airport. The workflows include tasks such as classification comparison, regression analysis, and outlier detection. Each workflow performs some data preprocessing and visualization tasks before applying various machine learning algorithms to the data.

This repository can serve as a starting point for anyone looking to analyze the Air Traffic Passenger Statistics dataset or to explore machine learning techniques using Knime.

Workflow 1: Classification Algorithms Comparison

This workflow compares the performance of three classification algorithms: Naive Bayes, Decision Tree, and Random Forest, on this dataset. It begins with data preprocessing steps to clean and prepare the data for analysis.

Next, the workflow uses learning curves to determine the optimal hyperparameters for the Decision Tree and Random Forest models, which helps to improve their accuracy and generalization performance. For Naive Bayes, it applies backward feature elimination.

Finally, the workflow performs 10-fold cross-validation for each algorithm and enables the ability to observe and compare their accuracies.

Pic1

Workflow 2: Regression Algorithms Comparison

This workflow compares the performance of four regression algorithms: Linear Regression, Polynomial Regression, Simple Tree Regression, and Random Forest Regression. It begins with data preprocessing steps to clean and prepare the data for analysis. Next, the workflow uses learning curves to optimize the hyperparameters of the Random Forest, Decision Tree, and Polynomial Regression models. Finally, the workflow compares the performance of the algorithms using numeric scorer nodes and visualizes their predictions using scatter plots.

Pic2

Workflow 3: Outliers Detection

This workflow aims to identify potential outliers in the aformentioned dataset. The workflow uses the Numeric Outliers node, Box Plot, and Data Explorer to observe potential outliers in the dataset. It also uses K-Means with 4 clusters in passenger count and Group-By nodes to further investigate the features of the potential outliers.

Pic3

Running the Workflow

To run any of the workflows included in this repository, follow these general steps:

  1. Download and install Knime from the official website.
  2. Clone or download this repository to your local machine.
  3. Open Knime and go to "File" > "Import KNIME Workflow".
  4. Select the desired workflow file (e.g. "classification_comparison.knwf") from the downloaded repository and click "Finish".
  5. Once the workflow is imported, click on "Execute" to run the entire workflow or run each node individually to see the intermediate results.

Note: Before running the workflow, make sure to set the correct file path to the downloaded dataset in the "CSV Reader" node.

data-mining-operations's People

Contributors

panastasiadis avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.