Code Monkey home page Code Monkey logo

tns's Introduction

Telstra Network Disruptions

Kaggle Telstra Competition Solution entered after competition had closed - just for practice

Score: 0.52908

https://www.kaggle.com/c/telstra-recruiting-network/overview

Approach used:

• Based on the data explanation “Each row in the main dataset (train.csv, test.csv) represents a location and a time point. They are identified by the "id" column, which is the key "id" used in other data files.” It was assumed that ‘id’ referred to the time point. This was then used to link and group the remaining features in the other files

• Created ‘examine_data.py’ to examine all data files to get an overview of available data and possible features

• Created ‘feature_ extraction.py’ to extract features from data files

• This involved 3 of the ‘log’ files from the dataset (severity, event and resource) and splitting them as separate features based on their feature ids such as severity_type 1, severity_type 2. Basic count and frequency metrics by id were also calculated

• Location and log features were treated slightly differently as they seemed to provide richer data. Where possible count, max, min, mean and median features were calculated. An ordering column was added based on the log features ordered by id. Finally the features locations were ranked in ascending and descending order and by relative rank based on location id and log order.

• All the extracted features (~500) were then output to a .csv file called ‘features_extracted.csv’

• Created the file ‘feature_imp.py’ in which an XGBoost algorithm was used to rank the features in terms of importance and this was output to another.csv file ('feature_importance.csv'). In the final model this ranking was used to eliminate roughly half of the features generated

• Finally, wrote a ‘csv_submission.py ‘file to run the model including 10 fold cross validation and create a .csv file to submit to Kaggle

• Again this involved running an XGBoost algorithm and testing that the logloss was acceptable

tns's People

Contributors

meph1sto avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.