Code Monkey home page Code Monkey logo

carvana's Introduction

A Kaggle competition - Carvana: Don't get kicked!

Carvana

Description

Predict if a car purchased at auction is a lemon.

One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The auto community calls these unfortunate purchases "kicks".

Kicked cars often result when there are tampered odometers, mechanical issues the dealer is not able to address, issues with getting the vehicle title from the seller, or some other unforeseen problem. Kick cars can be very costly to dealers after transportation cost, throw-away repair work, and market losses in reselling the vehicle.

Modelers who can figure out which cars have a higher risk of being kick can provide real value to dealerships trying to provide the best inventory selection possible to their customers.

The challenge of this competition is to predict if the car purchased at the Auction is a Kick (bad buy).

Guidelines for the task on data understanding

Data understanding (30 points) Data semantics (3 points) Distribution of the variables and statistics (7 points) Assessing data quality (missing values, outliers) (7 points) Variables transformations (6 points) Pairwise correlations and eventual elimination of redundant variables (7 points)

Guidelines for the task on clustering

Clustering Analysis by K-means: (13 points) Choice of attributes and distance function (1 points) Identification of the best value of k (5 points) Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset (7 points) Analysis by density-based clustering (9 points) Choice of attributes and distance function (2 points) Study of the clustering parameters (2 points) Characterization and interpretation of the obtained clusters (5 points) Analysis by hierarchical clustering (5 points) Choice of attributes and distance function (2 points) Show and discuss different dendograms using different algorithms (3 points) Final evaluation of the best clustering approach and comparison of the clustering obtained (3 points)

Guidelines for the task on Association Rules Mining

Frequent patterns extraction with different values of support and different types (i.e. frequent, close maximal), (6 points) Discussion of the most interesting frequent patterns (7 points) Association rules extraction with different values of confidence (6 points) Discussion of the most interesting rules (7 points) Use the most meaningful rule to replace missing values and evaluate the accuracy (4 points)

Guidelines for the task on Classification

Learning of different decision trees with different parameters and gain formulas with the object of maximizing the performances (12 points) Decision trees interpretation (6 points) Decision trees validation with test and training set (6 points) Discussion of the best prediction model (6 points)

Guidelines for the Project

Title page is not counted in the 20 page limits, i.e., you can have 20 pages + 1 title page, the page limit is strict: additional pages will not be considered for the final evaluation, i.e., pages 21,22,23 etc. will not be read and evaluated. The project size must not exceed 25Mb, i.e. you must be able to send it by email without compression. Only PDF file are allowed, you do not have to submit python code or the knime workflows. The final paper must be easily readable, i.e., it is better to use font size higher than 9pt. Use a readable font size, e.g. Arial, Times New Romans You can use multiple columns and change the margin size but the project must be readable. It is NOT required to put python code, knime flows, or theoretical descriptions of the algorithm in the final paper. You must justify every choice you make with respect to the features used and selected for each algorithm and the parameters you tune. Discuss every result. Plots without any comment are useless. Even if you find a top configuration for your algorithm (e.g. K-Means with k=5) you MUST list which are the different parameters you tested and justify your choice. You can get 3 additional extra points in the final mark with respect to the following criteria: Innovation (0.5 points) Experimentation (0.5 points) Performance (0.5 points) Appearance (0.5 points) Organization (0.5 points) Summary (0.5 points)

carvana's People

Contributors

matteobogo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.