Code Monkey home page Code Monkey logo

datumbox-framework-zoo's Introduction

Datumbox Framework Zoo: Pre-trained models

Datumbox

This project contains pre-trained Machine Learning models which can be used with the Datumbox Machine Learning Framework v0.8.3-SNAPSHOT (Build 20201014).

Copyright & License

Copyright (c) 2013-2020 Vasilis Vryniotis.

Licensed under the Apache License, Version 2.0.

Pre-trained Models

The project contains the binary files of all the text classification models which are available via the Datumbox API:

  • Sentiment Analysis: The Sentiment Analysis model classifies documents as positive, negative or neutral (lack of sentiment) depending on whether they express a positive, negative or neutral opinion.
  • Twitter Sentiment Analysis: The Twitter Sentiment Analysis model allows you to perform Sentiment Analysis on Twitter. It classifies the tweets as positive, negative or neutral depending on their context.
  • Subjectivity Analysis: The Subjectivity Analysis model categorizes documents as subjective or objective based on their writing style. Texts that express personal opinions are labeled as subjective and the others as objective.
  • Topic Classification: The Topic Classification model assigns documents in 12 thematic categories based on their keywords, idioms and jargon. It can be used to identify the topic of the texts.
  • Spam Detection: The Spam Detection model labels documents as spam or nospam by taking into account their context. It can be used to filter out spam emails and comments.
  • Adult Content Detection: The Adult Content Detection model classifies the documents as adult or no-adult based on their context. It can be used to detect whether a document contains content unsuitable for minors.
  • Language Detection: The Language Detection model identifies the natural language of the given document based on its words and context. This classifier is able to detect 96 different languages.
  • Commercial Detection: The Commercial Detection model labels the documents as commercial or non-commercial based on their keywords and expressions. It can be used to detect whether a website is commercial or not.
  • Educational Detection: The Educational Detection model classifies the documents as educational or non-educational based on their context. It can be used to detect whether a website is educational or not.

Important Notes:

  • The models support only English.
  • The binary files should be loaded using their corresponding Framework version.
  • All the models should be loaded using the InMemory storage engine.
  • Within the folder of each model you will find a stats.txt file which contains the accuracy metrics of the classifier. The metrics were estimated using 10-fold cross validation.
  • All the remaining API methods which are not included here (Readability Assessment, Keyword Extraction, Text Extraction & Document Similarity) are directly powered up by standalone classes of the framework.

How to use

  1. Download/clone this project locally.
  2. Open your datumbox.configuration.properties file and make sure you use the InMemory engine by default:
    configuration.storageConfiguration=com.datumbox.framework.storage.inmemory.InMemoryConfiguration
    
  3. Open your datumbox.inmemoryconfiguration.properties file and update the directory:
    inMemoryConfiguration.directory=/path/to/datumbox-framework-zoo
    
  4. Within your project initialize the classifiers using their name:
    Configuration configuration = Configuration.getConfiguration();
    
    TextClassifier textClassifier = MLBuilder.load(TextClassifier.class, "SentimentAnalysis", configuration);
    System.out.println(textClassifier.predict("Datumbox is amazing!").getYPredicted());

Note that it is also possible to skip steps 2 & 3 and instead programmatically update the configuration object before initializing the classifier:

Configuration configuration = Configuration.getConfiguration();
InMemoryConfiguration storageConfiguration = new InMemoryConfiguration();
storageConfiguration.setDirectory("/path/to/datumbox-framework-zoo");
configuration.setStorageConfiguration(storageConfiguration);

Useful Links

datumbox-framework-zoo's People

Contributors

datumbox avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

datumbox-framework-zoo's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.