This R project analyzes real estate valuation, using a dataset with details like house age, distance to MRT stations, and convenience stores. It involves data preparation, exploratory analysis, linear modeling, and cross-validation techniques for developing predictive models of house prices, evaluated by Mean Squared Error.
This project is a detailed data analysis and modeling endeavor in R, focusing on real estate valuation. It utilizes a dataset encompassing attributes such as transaction dates, house age, distance to MRT stations, number of convenience stores, geographic coordinates, and house prices per unit area.
- Loading and cleaning the dataset.
- Renaming columns for clarity.
- Visualizing and statistically analyzing the dataset to uncover patterns, distributions, and variable relationships.
- Developing various linear models to understand relationships between house prices and dataset features.
- Includes both simple and multiple linear regression models.
- Thorough examination of linear model residuals to validate model assumptions and limitations.
- Employing methods like Leave-One-Out Cross-Validation (LOOCV) and K-fold cross-validation for model performance evaluation.
- Using advanced regression techniques such as Ridge Regression, Lasso Regression, Principal Component Regression (PCR), and Partial Least Squares Regression (PLS).
- Models are compared based on Mean Squared Error (MSE).
- Presenting statistical model outputs and their performance metrics.
- Analyzing results to derive insights into factors affecting real estate prices.
This project is ideal for data analysts, real estate market analysts, and researchers focusing on applying statistical methods to real-world data.
Please ensure R and the required packages are installed for running the project scripts.
Contributions are welcome. Feel free to fork the repository, make changes, and submit pull requests.