Created by Jeff Hebert 12/30/14
For Kaggle Competition: http://www.kaggle.com/c/datasciencebowl
This is basic starter code to create a plankton classification model based on image dimensions and image density. This code should produce a results file with public score of 3.397518. Surely you can come up with a better model!
- Download and unzip the competition data
- Set the data_dir variable to the path of the data directory
- Run the code. It will take 30-50 minutes, limited by IO of the thousands of tiny files
- Customize the extract_stats function to add your custom variables
- Remember to add new variables to train_data and test_data
- Use cross validation to get a better model.
- Please let me know if you find this starter code helpful.
- Remember to use Rprof() and microbenchmark() when your code slows down.
- Remember to use some form of version control and rename your submission files.
- Please let me know if you find some methods to speed up the code.