Code Monkey home page Code Monkey logo

lindera's People

Contributors

abetomo avatar bluegreenmagick avatar chatblanc-ciel avatar dependabot[bot] avatar encody avatar gitter-badger avatar higumachan avatar ikawaha avatar johtani avatar kerollmops avatar kitaitimakoto avatar manythefish avatar mochi-sann avatar mocobeta avatar mosuka avatar tokuhirom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lindera's Issues

Create a user dictionary package

Separate the functionality of the user dictionary contained in the lindera-ipadic-builder package into separate packages.

For example: lindera-user-dic-builder

Lindera doesn’t build

Currently, we can’t import lindera in the latest version. It doesn’t build, and since the change has been pushed as a minor version, it probably broke every project relying on lindera.

...
    Checking lindera-decompress v0.13.5
    Checking bstr v0.2.17
    Checking lindera-core v0.13.5
    Checking csv v1.1.6
   Compiling character_converter v2.1.0
    Checking lindera-unidic-builder v0.13.5
    Checking lindera-ipadic-builder v0.13.5
    Checking lindera-dictionary v0.13.5
    Checking lindera-ko-dic-builder v0.13.5
    Checking lindera-cc-cedict-builder v0.13.5
   Compiling lindera-ipadic v0.13.5
    Checking lindera v0.13.5
error[E0599]: no variant or associated item named `DictionaryTypeError` found for enum `lindera_core::error::LinderaErrorKind` in the current scope
  --> /Users/irevoire/.cargo/registry/src/github.com-1ecc6299db9ec823/lindera-0.13.5/src/tokenizer.rs:64:40
   |
64 |             _ => Err(LinderaErrorKind::DictionaryTypeError
   |                                        ^^^^^^^^^^^^^^^^^^^
   |                                        |
   |                                        variant or associated item not found in `lindera_core::error::LinderaErrorKind`
   |                                        help: there is a variant with a similar name: `DictionaryLoadError`

error[E0599]: no variant or associated item named `UserDictionaryTypeError` found for enum `lindera_core::error::LinderaErrorKind` in the current scope
  --> /Users/irevoire/.cargo/registry/src/github.com-1ecc6299db9ec823/lindera-0.13.5/src/tokenizer.rs:84:40
   |
84 |             _ => Err(LinderaErrorKind::UserDictionaryTypeError
   |                                        ^^^^^^^^^^^^^^^^^^^^^^^ variant or associated item not found in `lindera_core::error::LinderaErrorKind`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `lindera` due to 2 previous errors

You can check this repository to reproduce the issue; https://github.com/meilisearch/charabia on this sha 82c9f3b

Add GitHub Actions Integration

Add GitHub Actions Integration like mosuka/bayard#94 and some refactoring as follows:

  • Add GitHub Actions Integration
  • Make output format to enum
  • Make tokenize mode to enum
  • Optimize build script
  • Update Dockerfile

Prepare a trait for implement each dictionary builder

Duplicate functions that are written in each dictionary builder package.
Because of the maintenance issues, will prepare traits and implement a dictionary builder structure for each dictionary builder in its package.

Automate release tasks

Update workflows.

regression.yml : Run tests on three platforms (Linux/Windows/OSX) for each push/pull request.
periodic.yml: Run tests on stable/beta/nightly version of Rust periodically.
release.yml: When create tag, release it to GitHub and publish to crates.io.

Lindera-ipadict randomly as issue during build

When compiling lindera we frequently have a building error:

 error: failed to run custom build command for `lindera-ipadic v0.10.0`

Caused by:
  process didn't exit successfully: `D:\a\milli\milli\target\release\build\lindera-ipadic-caf28ea0e76b9e29\build-script-build` (exit code: 1)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rerun-if-changed=Cargo.toml

  --- stderr
  Error: Custom { kind: UnexpectedEof, error: TarError { desc: "failed to iterate over archive", io: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" } } }

It seems to be related to dictionaries.

Any idea of what could be the reason, the google drive download? 🤔

Change the project name again

The name Mokuzu has a similar pronunciation to mozc, so I want to avoid confusion.
Since this project is a fork of kuromoji-rs, change the name to be derived from kuromoji.

Build error in benches

   Compiling lindera v0.5.1 (/Users/johtani/IdeaProjects/rust-workspace/lindera-workspace/lindera/lindera)
error[E0599]: no function or associated item named `default_normal` found for struct `lindera::tokenizer::Tokenizer` in the current scope
 --> lindera/benches/bench.rs:8:40
  |
8 |         let mut tokenizer = Tokenizer::default_normal();
  |                                        ^^^^^^^^^^^^^^ function or associated item not found in `lindera::tokenizer::Tokenizer`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0599`.
error: could not compile `lindera`.

Can't build lindera-ipadic on Raspberry Pi 4B

Lindera-ipadic is a requirement of the zola static website generator written in Rust.

During the zola build, it fails while building lindera-ipadic with this error:
memory allocation of 805306368 bytes failed
error: could not compile lindera-ipadic.

Environment: Raspberry Pi 4B, 4GB memory, debian.

I have tried to give it more contiguous memory by rebooting and trying again with a fresh system and no user apps running. Even then, the system apparently can't give it 800MB (!) of presumably contiguous memory. free -mh shows 2.7GB free, but not contiguous, I imagine.

Zola developers have asked me to report this to you. They do not think lindera-ipadic requires 800MB to build.

Thanks.

Move lindera-cil to another repository

Move lindera-cil to another repository.
Currently, the lindera-cli package is managed in the lindera repository as a member of the workspace.
Keeping the lindera repository for library crates only and moving the binary crates like lindera-cli to a separate repository.

Publishing on crates.io

Publish on crates.io.
But cargo publish failed due to the following error:

error: api errors (status 200 OK): max upload size is: 10485760

Reconsider default LZMA dependency without any option to avoid it

Issue

The PR #139 introduced in v0.9.0 make LZMA (rust-lzma or lzma-rs) a mandatory dependency.
This forces all users to install the external library liblzma to be able to compile Lindera.

In comparison, the v0.8.1 needs only to add lindera in the project's cargo.toml.

Context

In Meilisearch we plan to use Lindera to tokenize Japanese texts, but we don't want to ask our users to install external libraries manually, in order to keep Meilisearch easy to install and easy to use.

Potential solutions

  • reconsider #139
  • choose a compression library that doesn't need a manually installed library (vendoring or rust library)
  • provide a feature flag to choose the compression method

Thanks for maintaining Lindera 😊

Downloading and decompressing dictionaries takes a lot of time

Hey @mosuka,

We were facing compilation slow dows at Meilisearch recently and investigated, we found out that it was lindera-ipadic that was taking a lot of time to probably download the mecab-ipadic-2.7.0-20070801.tar.gz tarball from SourceForge.

If you want to look at the time it takes on our side, you can just execute the below command and open the generated HTML report.

rustup update
cargo +nightly build --timings

But as we can see, the CPU is idle for a long time when it builds.

Unable download UniDic form clrd.ninjal.ac.jp

error: failed to run custom build command for `lindera-unidic v0.13.5 (/home/minoru/github.com/lindera-morphology/lindera/lindera-unidic)`

Caused by:
  process didn't exit successfully: `/home/minoru/github.com/lindera-morphology/lindera/target/debug/build/lindera-unidic-0a9382db4954e5bf/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rerun-if-changed=Cargo.toml

  --- stderr
  Error: Transport(Transport { kind: ConnectionFailed, message: Some("tls connection init failed"), url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("clrd.ninjal.ac.jp")), port: None, path: "/unidic_archive/cwj/2.1.2/unidic-mecab-2.1.2_src.zip", query: None, fragment: None }), source: Some(Custom { kind: InvalidData, error: InvalidCertificateData("invalid peer certificate: UnknownIssuer") }) })

Support user dictionary

Currently, Lindela does not support user dictionary. Rebuilding the system dictionary to register new term into the morphological dictionary is too much of a burden for light users.
So we're going to support simple user dictionary such as Kuromoji.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.