Code Monkey home page Code Monkey logo

examples's Introduction

Examples of WhizzML scripts and libraries

WhizzML

Each script or library is in a directory in this folder. For each one you will always find a readme explaining what's its purpose and usage, the actual whizzml code in a .whizzml file, and the JSON metadata needed to create BigML resources.

By convention, when the artifact is a library, the files are called library.whizzml and metadata.json, while for a script we use script.whizzml and metadata.json.

Examples

  • covariate-shift Determine if there is a shift in data distribution between two datasets.
  • model-or-ensemble Decide whether to use models or ensembles for predictions, given an input source.
  • remove-anomalies Using Flatline and an anomaly detector, remove from an input dataset its anomalous rows.
  • smacdown-branin Simple example of SMACdown, using the Branin function as evaluator.
  • smacdown-ensemble Use SMACdown to discover the best possible ensemble to model a given dataset id.
  • find-neighbors Using cluster distances as a metric, find instances in a dataset close to a given row.
  • stacked-generalization Simple stacking using decision tree, ensembles and logistic regression.
  • best-first Feature selection using a greedy algorithm.
  • gradient-boosting Boosting algorithm using gradient descent.
  • model-per-cluster Scripts and library to model data after clustering and make predictions using the resulting per-cluster model.
  • best-k Scripts and library implementing Pham-Dimov-Nguyen algorithm for choosing the best k in k-means clusters.
  • seeded-best-k Scripts and library implementing Pham-Dimov-Nguyen algorithm for choosing the best k in k-means clusters, with user-provided seeds.
  • anomaly-shift Calculate the average anomaly between two given datasets.
  • cross-validation Scripts for performing k-fold crossvalidation.
  • clean-dataset Scripts and library for cleaning up a dataset.
  • boruta Script for feature selection using the Boruta algorithm.
  • cluster-classification Script that determines which input fields are most important for differentiating between clusters.
  • anomaly-benchmarking Script that takes any dataset (classification or regression) and turns it into a binary classification problem with the two classes "normal" and "anomalous".

Compiling packages and running tests

The makefile at the top level provides targets to register packages and run tests (when they're available). It needs a working installation of bigmler. Just type

make help

for a list of possibilities, including:

  • tests to run all available test scripts (which live in the test subdirectory of some packages), which typically use bigmler.

  • compile to use bigmler to register in BigML the resources associated with one or more packages in the repository.

  • clean to delete resources and outputs (both remote and local) created by compile.

  • distcheck combines most of the above to check that all the scripts in the repository are working: this target should build cleanly before merging into

The verbosity of the tests output can be controlled with the variable VERBOSITY, which runs from 0 (the default, mostly silent) to 2. E.g.:

make tests VERBOSITY=1

If you write your own test scripts, include test-utils.sh for shared utilities.

examples's People

Contributors

akashenfelter avatar hangarr avatar jaor avatar mmerce avatar osroca avatar sdesimone avatar whizzmler avatar

Watchers

 avatar  avatar

Forkers

zhaoye159

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.