Detials of Hackthons:
Sales Prediction for Big Mart Outlets The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and predict the sales of each product at a particular outlet.
Using this model, BigMart will try to understand the properties of products and outlets which play a key role in increasing sales.
Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.
Data Dictionary
We have train (8523) and test (5681) data set, train data set has both input and output variable(s). You need to predict the sales for test data set.
Evaluation Metric
Your model performance will be evaluated on the basis of your prediction of the sales for the test data (test.csv), which contains similar data-points as train except for the sales to be predicted. Your submission needs to be in the format as shown in sample submission.
We at our end, have the actual sales for the test dataset, against which your predictions will be evaluated. We will use the Root Mean Square Error value to judge your response.
Please checkout the notebook for the result and approch
For the above hackthon total 42320 people current rank for is 740
Approch:
- Item_Weight and Outlet_Size have some missing values in the data so by cheking outlier imputed with mean values. For outlet_size is categorical variable so imputed with mode.
- Later by doing some visualization came with some interesting insight of data. Like which item mostly having most sales so shop owner can purchaes that item in much quantity.
- Implemented Base model by using linear regression.
- As linear regression not giving good result so implemented RandomForestRegressor.
- Later by usign RandomSearchCV used to tune the parameter and come up with RMSE:1150
- Tried by using Xgboost Regressor but not able to reduce RMSE.
- In future will try another algorithms and tune it so RMSE can get reduce more.
Sentiment analysis remains one of the key problems that has seen extensive application of natural language processing. This time around, given the tweets from customers about various tech firms who manufacture and sell mobiles, computers, laptops, etc, the task is to identify if the tweets have a negative sentiment towards such companies or products.
Evaluation Metric : The metric used for evaluating the performance of classification model would be weighted F1-Score.
Please checkout the notebook for the result and approch
For the above hackthon total 6910 people current rank for is 291
Data Processing:
- Lower-case all characters
- Remove twitter handles
- Remove urls
- Replace unidecode characters
- Only keep characters
- Keep words with length>1 only
- Replace words like 'whatisthis' to ' what is this'
- Remove repeated spaces