Data was collected from coursera skills network lab. we downloaded and saved in a pandas data frame. Data wrangling done by checking missing or null values and replace them with mean. we selected suitable columns for prediction of price. we standardized and normalized data. we created dummy label to categorical values. Exploratory data analysis was done by finding variables which have most impact on price. we done visualization to know relation between price and variables. we found correlation between bore, stoke, compression-ratio, horse power. engine-size have +ve linear relation with price. Highway-mpg have -ve linear relation with price. peak-rpm, price didn't have any relation. we saw relation between categorical values with price by box plot. we did descriptive statistical analysis. grouping was done for drive-wheels into 3 categories. we created pivot table for drive wheels & body style. we plotted heat map for body style, drive wheels & price. calculated Pearson correlation for different variables with price. The Analysis of Variance (ANOVA) is a statistical method done to test whether there are significant differences between the means of two or more groups. ANOVA returns two parameters: F-test score & p-value. we evaluated linear & multiple linear regression and visualize residual, regression scatter plots to check the model is fit or not. Created pipelines for linear, MLR & PR. we did in-sample evaluation. we have to compare SLR,MLR,PF models r2_square and MSE. which model have high r2_score and MSE is good fit Between SLR and MLR values of MSE and r2 score of SLR is smaller compare to values of MSE and r2 score of MLR so MLR is good fit. Between SLR and PF values of MSE and r2 score of PF is smaller than SLR so SLR is good fit. Between MLR and PF values of MSE of MLR is smaller than MSE of PF and r2 score of MLR is greater than r2 score of PF so MLR is good fit.
On all these models MLR(Multiple linear regression) is good fit.