Code Monkey home page Code Monkey logo

younes-abouelnagah / junto Goto Github PK

View Code? Open in Web Editor NEW

This project forked from scalanlp/junto

0.0 1.0 0.0 8.21 MB

This toolkit consists of implementations of various graph-based semi-supervised learning (SSL) algorithms. Currently, three algorithms are implemented: Gaussian Random Fields (GRF), Adsorption, and Modified Adsorption (MAD). Junto also contains Hadoop-based implementations of these three algorithms.

Home Page: https://github.com/parthatalukdar/junto

License: Apache License 2.0

Shell 0.72% Scala 42.90% Java 56.38%

junto's Introduction

The Junto Label Propagation Toolkit

Label propagation is a popular approach to semi-supervised learning in which nodes in a graph represent features or instances to be classified and the labels for a classification task are pushed around the graph from nodes that have initial label assignments to their neighbors and beyond.

This package provides an implementation of the Adsorption and Modified Adsorption (MAD) algorithms described in the following papers.

Please cite Talukdar and Crammer (2009) and/or Talukdar and Pereira (2010) if you use this library.

Additionally, LP_ZGL, one of the first label propagation algorithms is also implemented.

  • Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.

Why is the toolkit named Junto? The core code was written while Partha Talukdar was at the University of Pennsylvania, and Ben Franklin (the founder of the University) established a club called Junto that provided a structured forum for him and his friends to debate and exchange knowledge. This has a nice parallel with how label propagation works: nodes are connected and influence each other based on their connections. Also "junto" means "along" and "together" in a number of Latin languages, and carries the connotation of cooperation---also a good fit for label propagation.

What's inside

The latest stable release of Junto is 1.6.0. Here are the changes from version 1.5:

  • Changed upenn.junto._ to junto._
  • Added junto.JuntoContext, which has functions for making it easier to interact with graphs and pull out results after running label propagation.
  • Added prepositional phrase attachment test.
  • Added some helpers for creating graphs with nodes of different types, e.g. junto.config.VertexName.
  • Now using Scallop for command-line parsing (instead of Argot).

See the CHANGELOG for changes in previous versions.

Using Junto

In SBT:

libraryDependencies += "org.scalanlp" % "junto" % "1.6.0"

In Maven:

<dependency>
   <groupId>org.scalanlp</groupId>
   <artifactId>junto</artifactId>
   <version>1.6.0</version>
</dependency>

Requirements

Configuring your environment variables

The easiest thing to do is to set the environment variables JAVA_HOME and JUNTO_DIR to the relevant locations on your system. Set JAVA_HOME to match the top level directory containing the Java installation you want to use.

Next, likewise set JUNTO_DIR to be the top level directory where you unzipped the download and then add the directory JUNTO_DIR/bin to your path.

Once you have taken care of these three things, you should be able to build and use the Junto Library.

Building the system from source

Junto uses SBT (Simple Build Tool) with a standard directory structure. To build Junto, go to JUNTO_DIR and type:

$ ./build update compile

This will compile the source files and put them in ./target/classes. If this is your first time running it, you will see messages about Scala being dowloaded -- this is fine and expected. Once that is over, the Junto code will be compiled.

To try out other build targets, do:

$ ./build

This will drop you into the SBT interface. Many other build targets are supported.

Trying it out

If you've managed to configure and build the system, you should be able to go to $JUNTO_DIR/examples/simple and run:

$ junto config simple_config

Please look into the examples/simple/simple_config file for various options available. Sample (dummy) data is made available in the examples/simple/data directory.

A more extensive example on prepositional phrase attachment is in src/test/scala/junto/prepattach.scala. Look at that file for an example of using Junto as an API to construct a graph and run label propagation.

Hadoop

If you are interested in trying out the Hadoop implementations, then please look into examples/hadoop/README.

Getting help

Documentation is admittedly thin. If you get stuck, you can get help by posting questions to the junto-open group.

Also, if you find what you believe is a bug or have a feature request, you can create an issue.

junto's People

Contributors

jasonbaldridge avatar dhgarrette avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.