Code Monkey home page Code Monkey logo

morphism's Introduction

Overview

The goal here is to find out if there is a correspondence between the nodes in the OpenCog intensional reasoning and their corresponding vectors built using various methods. Motivation and preliminary results are detailed in:

For making the embedding vectors, two methods have been implemented at the moment:

  1. DeepWalk

Makes use of the Word2Vec language model to make embeddings for the nodes (as opposed to words in linguistic context) in the AtomSpace. The "sentences" are built by taking "walks" from one node to another through their interconnected links. The likelihood of a walk being selected is based on the TruthValue of each of the links connected to a particular node being looked at during "sentence" construction (as opposed to selecting randomly as done in the original DeepWalk algorithm). On the other hand, the datasets we have been using here are fairly simple, the properties associated with the concepts are "crisp", so the walk are basically chosen randomly. But when a different dataset is used, additional logic should be implemented to handle probabilistic walk selecting.

  1. Fuzzy-membership based

Embedding of a node is built, with an entry correspond to a property found in the AtomSpace, and the value being the TruthValue of the AttractionLink, which reflects if a particular property is more likely to be a property of that node than other nodes.

After building the vectors, some dimension reduction technique can optionally be done (but is actually recommended for the vectors built using the 2nd method because those are very sparse vectors in most cases). Two techniques are implemented at the moment -- PCA and KPCA. For KPCA, two kernel functions can be used:

  1. Tanimoto -- Compute the Tanimoto distance between two vectors
  2. Fuzzy Jaccard -- Compute the intensional similarity using the actual calculation used in the PLN intensional similarity direct introduction rule

The steps in an experiment involved:

  1. Converting the data into Atomese
  2. Generate SubsetLinks and AttractionLinks for the concepts that we are interested in, and calculate TruthValues for them (for PLN intensional reasoning)
  3. Create embedding vectors for the same set of concepts, using either one of the available embedding methods
  4. Randomly select pairs of concepts, and for each pair, compute the intensional similarity and vector distance between them
  5. Calculate the correlation between the intensional similarities and vector distances calculated in the previous step

To Run

First of all, please check out (then build and install) the branch of the PLN repo at https://github.com/leungmanhin/pln/tree/pln-morphism, which include a minor modification that considers the confidence of a TruthValue into the intensional similarity calculation all the time, to be a little more consistent.

In this repository, there are two Python scripts, correspond to either the toy dataset that's made for testing purpose only, or the Social Network: MOOC User Action Dataset.

There is an additional script main.py, which is used to control what functions to be called for the experiment, e.g. load embeddings vs generate embeddings, be sure to check out what's in there, comment/uncomment out the parts that are needed or not, before running it via the following command:

python3 -B main.py

The results will be output to a CSV file under results directory.

morphism's People

Contributors

leungmanhin avatar noskill avatar tanksha avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.