Code Monkey home page Code Monkey logo

ediss-ds-mini2's Introduction

EDISS-DS-MINI2

Open In Colab

Abstract

This machine learning initiative focuses on the classification task of predicting the final grades for 107 enrolled students in a fully online, nine-week machine learning course. The study, administered via the Moodle learning management system, utilizes students' performance metrics, including mini projects, quizzes, peer reviews, and the final grade. Through essential phases like exploratory data analysis, data preprocessing, feature selection, and algorithm selection, the project consistently yields highly accurate models, surpassing a 77% accuracy threshold.

1 Introduction

In the context of digital transformation in education, accurately predicting student performance in online courses becomes crucial for proactive intervention. The project uses a dataset from 107 students, employing a structured workflow for predictive modeling. Key phases include exploratory data analysis (EDA), data preprocessing, and feature selection, streamlining the feature set for subsequent analysis.

2 EDA, Data Preprocessing and Feature Selection

2.1 Exploratory Data Analysis (EDA)

The initial step involves examining and visualizing the dataset. Highlights include 107 rows, 48 columns, and no missing values or outliers. Various data types are observed, and data characteristics such as quiz and project grades are explored.

2.2 Data Preprocessing

Given the dataset's cleanliness, specific preprocessing steps involve removing uniform features and the 'ID' feature.

2.3 Feature Selection

Feature selection is performed based on correlation analysis, retaining 18 features and 1 label for the dataset.

3 Algorithm Selection

To address the classification problem, various machine learning algorithms are evaluated, including "K-Nearest Neighbors," "Decision Tree," and "Random Forest." Cross-validation is employed for fair comparison, with Decision Tree emerging as the top performer.

4 Model Selection and Hyper-parameter Tuning

Three algorithms are selected for model training: K-Nearest Neighbors, Decision Tree, and Random Forest. Hyper-parameter tuning is performed to optimize each model.

4.1 K Nearest Neighbours

Hyper-parameter tuning for KNN involves pruning features, optimizing the number of neighbors (K), and thresholding accuracy scores.

4.2 Decision Tree

Grid search is applied to identify optimal hyper-parameters for the Decision Tree. The model's overfitting tendencies are visualized and feature importance is analyzed.

4.3 Random Forest

Similar to Decision Tree, Random Forest undergoes grid search for optimal hyper-parameters. The ensemble method outperforms Decision Tree, demonstrating higher accuracy.

5 Conclusion

5.1 Model Comparison

The Decision Tree, KNN, and Random Forest models are compared, highlighting their respective strengths and weaknesses. Decision Tree exhibits overfitting, while KNN and Random Forest achieve better accuracy.

5.2 Scientific Bottlenecks

Data inadequacy and limited knowledge in data science present challenges. The grading mechanism's simplicity and potential relationships between learning activities and grades could be explored further with more data.

The project represents a significant learning experience in data science, with an aspiration to gain mastery over a broader array of algorithms and evaluation methods for constructing more robust models in the future.

ediss-ds-mini2's People

Contributors

lunapapa-finland avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.