This project is an investigation into the fundamental ideas that underpin machine learning, with a particular emphasis on the use of a straightforward linear regression model. The analytic solution and gradient descent are the two primary approaches that are incorporated into the project in order to locate the line that provides the greatest fit for a particular dataset. The primary goal is to compare the effectiveness of different strategies on synthetic datasets and to gain an understanding of how these methods operate.
- Analytic Solution: Calculates the optimal weights for the regression model analytically using the normal equation.
- Gradient Descent: Iteratively adjusts the weights to minimize the cost function, showcasing the practical application of this widely used optimization technique.
- Python 3.x
- numpy
This project uses synthetic datasets provided in .in
files for training the linear regression models. Each .in
file contains multiple lines, each representing a data point.
A line consists of space-separated real numbers, where the last number is the target variable (y) and the preceding numbers are the feature variables (x1, x2, ..., xM).
x1 | x2 | y |
---|---|---|
14 | 20 | 69 |
16 | 3 | -1 |
24 | 30 | 99 |
11 | 62 | 240 |
30 | -4 | -43 |
In this example, each line represents a data point with two features and one target variable.
Hyperparameters for the gradient descent method are specified in .json
files. Each .json
file corresponds to an .in
file and contains the learning rate and the number of iterations.
{
"learning rate": 0.0001,
"num iter": 1000
}
{
"learning rate": 0.01,
"num iter": 1000
}
Clone this repository to your local machine:
git clone https://github.com/SatvikVarshney/LinearRegressionFromScratch.git
After cloning, navigate to the project directory:
cd LinearRegressionFromScratch
Install the required dependencies:
pip install -r requirements.txt