Code Monkey home page Code Monkey logo

cortado-fs's Introduction

cortado-fs: high performance 100% F# implementation of XGBoost

Main features:

  • native support for both numeric and categorical data (covariates and factors)
  • innovative feature engineering: virtual data columns and easy conversions between numeric and categorical data
  • out of core data processing when dataframes are bigger than RAM
  • implementation of XGBoost logistic in just a few hundred lines of F# code, 3x faster than original C++ implementation (tree_method=exact, single threaded)
  • easy to extend, written in functional style for easy composition
  • easily import data from Feather format
  • work in progress - POC, ported from my Python/llvm cortado implementation with the same performance characteristics

How to take it for a spin:

  • sample data with 1M observations in Feather format is included in the repo (airlinetrain.feather)
  • build from source, no nuget yet available
  • run XGRun.fsx script to fit a model (equivalent to oryginal XGBoost implementation with tree_method=exact and single threaded).

Factors and covariates

Numeric data is represented by Covariate type and categorical by Factor. Categorical data is stored as integer indexes pointing to string levels, similar to Feather. Is is easy to convert covariate into factor and factor into covariate. See XGRun script. All operations are lazy and heavily use sequences of memory slices. Slice length is an input parameter for the algorithm and optimal value will depend on CPU cache. Cache friendliness is probably the biggest factor in making the implementation fast.

More work required

If there is interest in this approach I am happy to transfer the code to any other repo and help in any way I can.

cortado-fs's People

Contributors

adam-edgeware avatar amlocek avatar

Stargazers

Fuad Abdallah avatar  avatar  avatar Kevin Malenfant avatar

Watchers

 avatar

Forkers

kevmal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.