Spark being one of the hottest technologies in Big Data, better be ready by mastering some Scala!
This case is simply practice, no crazy breakthrough.
This dataset I got from kaggle contains the info of the apps of the Google Play Store.
Interesting stuff!
The goal was to predict if an app would be free or paid when installed.
I took out the «Price» feature because that would have bring no challenge whatsoever.
I did some typical data cleaning, feature selection, feature engineering, data processing...
At the end, I reached a RMSE of 0.276.