Code Monkey home page Code Monkey logo

translation-memory-tools's Introduction

Python application

Introduction

This is the toolset used at Softcatalà to build the translation memories for all the projects that we know exist in Catalan language and have their translations available openly. You can see it on-line at https://www.softcatala.org/recursos/memories/

The toolset contains the following components with their own responsibility:

Builder (fetch and build memories)

  • Download and unpack the files from source repositories
  • Convert from the different translation formats (ts, strings, etc) to PO
  • Create a translation memory for project in PO and TMX formats
  • Produce a single translation memory file that contains all the projects

Web

  • Provides an API that allow users download memories and search translations
  • Provides an index-creator that creates a Whoosh index with all the strings than then the user can search using the web app
  • Provides an download-creation that creates a zip file with all memories that the user can download

Terminology (terminology extraction)

  • Analyzes the PO files and creates a report with the most common terminology across the projects

Quality (feedback on how to improve translations)

  • Runs Pology and LanguageTool and generates HTML reports on translation quality

Installation

Setting up before execution

In order to download the translations of some of the projects you need to use the credentials for these systems, for example API keys.

builder.py expects the credentials to be defined in the following locations:

  • At cfg/credentials in the diferent YAML files: for Zenata (zanata.yaml), for Weblate (weblate.yaml) and for Crowdin (crowdin.yaml). The files -sample provide examples of how these files should be structured.
  • For Transifex, the credentials should be at ~/.transifexrc since this where Transifex cli tool expects the credentials.

All these projects require you to have the right credentials and often be "member of the Catalan project" to be able to download credentials.

If you are building a local Docker image, place your Transifex credentials file in the cfg/credentials/transifexrc directory, and this will be copied in the right location in the docker image. Remember that docker context cannot access your ~ directory.

Running the builder code locally

This part focuses on helping you to run the builder component locally in case that you want to test quickly new projects configurations. For any other use case, we recommend using the Docker.

Debian:

sudo apt-get update -y && sudo apt-get install python3-dev libhunspell-dev libyaml-dev gettext zip mercurial bzr ruby git curl wget g++ subversion bzip2 python2-dev -y
curl https://raw.githubusercontent.com/transifex/cli/master/install.sh | bash && mv ./tx /usr/bin/
sudo gem install i18n-translators-tools
pip3 install -r requirements.txt

macOS:

brew install python3 breezy hunspell libyaml gettext zip mercurial ruby git curl wget gcc subversion bzip2
curl https://raw.githubusercontent.com/transifex/cli/master/install.sh | bash
sudo gem install i18n-translators-tools
pip3 install -r requirements.txt

For example, to download only the Abiword project:

cd src
./builder.py -p Abiword

Running the system locally using Docker

This requires that you have docker, docker-compose and make installed in your system.

First download the data for the projects and generate the data quality reports:

make docker-run-builder

Downloading all the projects can take up to a day, which is not acceptable for a development cycle. In the docker/local.yml the variable DEV_SMALL_SET forces to only download some projects. This small subset does not requiere any specific credentials to be defined to download them.

The output files are copied to web-docker local directory to make easy to for you explore the results.

To run the web app which provides the microservices for the web site:

make docker-run-webapp

To test it from the browser:

Contributing

If you are looking at how to contribute to the project see HOW-TO.md

Contact Information

Jordi Mas: [email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.