Code Monkey home page Code Monkey logo

grahman20 / dmi Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 22 KB

DMI Class implements the DMI imputation algorithm for imputing missing values in a dataset from Rahman, M. G., and Islam, M. Z. (2013): Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques

Home Page: https://csusap.csu.edu.au/~grahman/

Java 100.00%
analysis data data-cleaning data-mining data-science imputation imputation-algorithm java missing missing-value-imputation

dmi's Introduction

DMI

Class that implements the DMI imputation algorithm for imputing missing values in a dataset. DMI splits the dataset into horizontal segments using a C4.5 (J48) decision tree in order to increase the correlation between attributes for EMI. EMI is performed to impute missing numerical attribute values and mean/mode (within a leaf) imputation is used to perform missing categorical attribute values. Uses Amri Napolitano's EMI implementation for Weka.

DMI specification from:

Rahman, M. G., and Islam, M. Z. (2013): Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques, Knowledge-Based Systems, Vol. 53, pp. 51 - 65, ISSN 0950-7051, DOI information: 10.1016/j.knosys.2013.08.023, Available at http://www.sciencedirect.com/science/article/pii/S0950705113002591

For more information, please see Dr Gea Rahman's website here

BibTeX

@article{rahman2013imputation,
  title={Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques},
  author={Rahman, Md. Geaur and Islam, Md Zahidul},
  journal={Knowledge-Based Systems},
  volume={53},
  pages={51--65},
  year={2013},
  publisher={Elsevier}
}

Installation

Either download DMI from the Weka package manager, or download the latest release from the "Releases" section on the sidebar of Github. A video on the installation and use of the package can be found here.

Compilation / Development

Set up a project in your IDE of choice, including weka.jar and EMImputation.jar as compile-time libraries. EMImputation.jar is available in the Weka package manager.

Changes:

  • Leaves that are too small to run EMI are replaced by the nearest node above them in a tree. We call this process "merging".
  • Records that cannot be assigned to any leaves in a tree for imputing an attribute will be assigned to the leaf with the closest centroid.
  • If there are too few records in the whole dataset to ever run EMI (i.e. number of records < (number of numeric attributes in dataset + 2)) no merging takes place.

Valid options are:

-D minCategoriesForDiscretization - Minimum number of categories for discretization

-N j48MinRecordsInLeaf - Minimum number of records in a J48 tree leaf. A negative value will default to (number of numeric attributes in dataset + 2).

-F j48ConfidenceFactor - Confidence factor for J48

-E minRecordsForEMI - Minimum records in a leaf for EMI to be able to run. A negative value will default to (number of numeric attributes in dataset + 2)

-I emiNumIterations - Iterations for EMI. A negative value will be set to Integer.MAX_VALUE

-L emiLogLikelihoodThreshold - Log likelihood threshold for terminating EMI.

dmi's People

Contributors

grahman20 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.