Code Monkey home page Code Monkey logo

ctranslate's Introduction

Build Status

CTranslate

CTranslate is a C++ implementation of OpenNMT's translate.lua script with no LuaTorch dependencies. It facilitates the use of OpenNMT models in existing products and on various platforms using Eigen as a backend.

CTranslate provides optimized CPU translation and optionally offloads matrix multiplication on a CUDA-compatible device using cuBLAS. It only supports OpenNMT models released with the release_model.lua script.

Dependencies

  • Eigen >= 3.3
  • Boost (program_options, when -DLIB_ONLY=OFF)

Optional

  • CUDA for matrix multiplication offloading on a GPU
  • Intel® MKL for an alternative BLAS backend

Compiling

CMake and a compiler that supports the C++11 standard are required to compile the project.

git submodule update --init
mkdir build
cd build
cmake ..
make

It will produce the dynamic library libonmt.so (or .dylib on Mac OS, .dll on Windows) and the translation client cli/translate.

CTranslate also bundles OpenNMT's Tokenizer which provides the tokenization tools lib/tokenizer/cli/tokenize and lib/tokenizer/cli/detokenize.

Options

  • To give hints about Eigen location, use the -DEIGEN_ROOT=<path to Eigen library> option.
  • To compile only the library, use the -DLIB_ONLY=ON flag.
  • To disable OpenMP, use the -DWITH_OPENMP=OFF flag.

Performance tips

  • Unless you are cross-compiling for a different architecture, add -DCMAKE_CXX_FLAGS="-march=native" to the cmake command above to optimize for speed.
  • Consider installing Intel® MKL when you are targetting Intel®-powered platforms. If found, the project will automatically link against it.

Using

Clients

See --help on the clients to discover available options and usage. They have the same interface as their Lua counterpart.

Library

This project is also a convenient way to load OpenNMT models and translate texts in existing software.

Here is a very simple example:

#include <iostream>

#include <onmt/onmt.h>

int main()
{
  // Create a new Translator object.
  auto translator = onmt::TranslatorFactory::build("enfr_model_release.t7");

  // Translate a tokenized sentence.
  std::cout << translator->translate("Hello world !") << std::endl;

  return 0;
}

For a more advanced usage, see:

  • include/onmt/TranslatorFactory.h to instantiate a new translator
  • include/onmt/ITranslator.h (the Translator interface) to translate sequences or batch of sequences
  • include/onmt/TranslationResult.h to retrieve results and attention vectors
  • include/onmt/Threads.h to programmatically control the number of threads to use

Also see the headers available in the Tokenizer that are accessible when linking against CTranslate.

Unsupported features

Some model configurations are currently unsupported:

  • GRU
  • deep bidirectional encoder
  • pyramidal deep bidirectional encoder
  • concat variant of global attention
  • bridges other than copy

Additionally, CTranslate misses some advanced features of translate.lua:

  • gold data score
  • best N hypotheses
  • hypotheses filtering
  • beam search normalization

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.