Code Monkey home page Code Monkey logo

ner_for_protein_structures's Introduction

ner_for_protein_structures

This package is published alongside a scientific publication describing the development of a human-in-the-loop named entity recognition algorithm specific for protein structures.

Here we provide a number of command line tools to convert annotations found in BioC formatted XML files, as they have been exported from our annotation tool TeamTat (https://www.teamtat.org/), into other formats.

For more details read the documentation here: https://ner-for-protein-structures.readthedocs.io/en/latest/

Installation

Clone the repository from the source code on Github:

git clone https://github.com/PDBeurope/ner_for_protein_structures.git

It is good practice to create a virtual environment for development:

python3 -m venv ner_venv

Now activate the venv.

source ner_venv/bin/activate

Next, install all the necessary dependencies using the provided requirements.txt

pip install -r requirements.txt

To be able to use some of the NLP tools install the scientific, English language model

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_sm-0.5.3.tar.gz

Downloading models

The different available models, their performance stats and download links are given in section "Models" of the documentation. Huggingface supports git and all models can simply be downloaded through "git clone", see example below. However, as the binary file for the models is too large for standard git, the large-file handler needs to be installed in the parent directory the model will be cloned into.

git lfs install

After the large-file handler was installed, the models can be cloned from Huggingface as in the example below.

git clone https://huggingface.co/PDBEurope/Bioformer8L-ProteinStructure-NER-v0.1

Alternatively, the models can be accessed through Huggingface's inference API. This option does require a Huggingface account and an authentication token. The details on how to register and how to set up the token can be found on Huggingface

Annotation handbook and TeamTat user guide

The annotation handbook with details on how to annotate different entity types and the user guide on the annotation tool TeamTat can be found here: Annotation handbook and user guide

Support

For any feedback, help, bug report please email to: [email protected]

Authors

This repository was developed at European Bioinformatics Institute

  • Lead Developer: Melanie Vollmar

License

This project is covered by an MIT license.

ner_for_protein_structures's People

Contributors

mevol avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.