Code Monkey home page Code Monkey logo

ai-team-uoa / pyjedai Goto Github PK

View Code? Open in Web Editor NEW
63.0 5.0 10.0 129.68 MB

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

Home Page: https://pyjedai.readthedocs.io

License: Apache License 2.0

Python 99.98% Dockerfile 0.02%
deduplication entity-matching entity-resolution python link-discovery data-matching fuzzy-matching machine-learning data-disambigation duplicate-detection

pyjedai's Introduction


pyJedAI


An open-source library that leverages Python’s data science ecosystem to build
powerful end-to-end Entity Resolution workflows.

Overview

pyJedAI is a python framework, aiming to offer experts and novice users, robust and fast solutions for multiple types of Entity Resolution problems. It is builded using state-of-the-art python frameworks. pyJedAI constitutes the sole open-source Link Discovery tool that is capable of exploiting the latest breakthroughs in Deep Learning and NLP techniques, which are publicly available through the Python data science ecosystem. This applies to both blocking and matching, thus ensuring high time efficiency, high scalability as well as high effectiveness, without requiring any labelled instances from the user.

Key-Features

  • Input data-type independent. Both structured and semi-structured data can be processed.
  • Various implemented algorithms.
  • Easy-to-use.
  • Utilizes some of the famous and cutting-edge machine learning packages.
  • Offers supervised and un-supervised ML techniques.

Open demos are available in:

       

Google Colab Hands-on demo:

Install

pyJedAI has been tested in Windows and Linux OS.

Basic requirements:

  • Python version greater or equal to 3.8.
  • For Windows, Microsoft Visual C++ 14.0 is required. Download it from Microsoft Official site.

PyPI

Install the latest version of pyjedai:

pip install pyjedai

More on PyPI.

Git

Set up locally:

git clone https://github.com/AI-team-UoA/pyJedAI.git

go to the root directory with cd pyJedAI and type:

pip install .

Docker

Available at Docker Hub, or clone this repo and:

docker build -f Dockerfile

Dependencies

         


           

See the full list of dependencies and all versions used, in this file.

Status

Tests PyPi made-with-python codecov

Statistics & Info

PyPI - Downloads PyPI version

Bugs, Discussions & News

GitHub Discussions is the discussion forum for general questions and discussions and our recommended starting point. Please report any bugs that you find here.

Java - Web Application

pyJedAI

For Java users checkout the initial JedAI. There you can find Java based code and a Web Application for interactive creation of ER workflows.

JedAI constitutes an open source, high scalability toolkit that offers out-of-the-box solutions for any data integration task, e.g., Record Linkage, Entity Resolution and Link Discovery. At its core lies a set of domain-independent, state-of-the-art techniques that apply to both RDF and relational data.


Team & Authors

pyJedAI

This is a research project by the AI-Team of the Department of Informatics and Telecommunications at the University of Athens.

License

Released under the Apache-2.0 license (see LICENSE.txt).

Copyright © 2024 AI-Team, University of Athens



       

This project is being funded in the context of STELAR that is an HORIZON-Europe project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.