Code Monkey home page Code Monkey logo

isoxya-plugin-spellchecker's Introduction

Isoxya plugin Spellchecker

Isoxya plugin Spellchecker provides spellchecking capabilities to entire websites, even if they have millions of pages, and supports 7 languages. It is a plugin for Isoxya web crawler.

https://hub.docker.com/r/tiredpixel/isoxya-plugin-spellchecker
https://github.com/tiredpixel/isoxya-plugin-spellchecker

Languages

Code Language Variants
en * English gb (BrE), us (AmE)
cs Czech cz
de German de
es Spanish es (European)
et Estonian ee
fr French fr
nl Dutch nl

*: this is the default, if no language or variant is specified

Many other languages can be added easily, since both Hunspell and MySpell dictionaries are used. If it's available in the build OS, it can probably be added, with appropriate tests and extensions to the Isoxya engine interface.

Example

[
  {
    "paragraph": "GloBal heating is increesing droughts, soil erosion and wildfires while diminishing crop yields in the tropics and thawing permafrost near the Poles, says the report by the Intergovernmental Panel on Climate Change.",
    "results": [
      {
        "correct": false,
        "offset": 1,
        "status": "miss",
        "suggestions": [
          "Global",
          "Glob al",
          "Glob-al"
        ],
        "word": "GloBal"
      },
      {
        "correct": false,
        "offset": 19,
        "status": "miss",
        "suggestions": [
          "increasing",
          "screening",
          "resining",
          "cresting",
          "resisting"
        ],
        "word": "increesing"
      }
    ]
  }
]

Installation

Compile and boot locally:

docker compose up

Images are also published using the latest tag (for development), and version-specific tags (for production). Do not use a latest tag in production!

Licence

Copyright © Nic Williams. It is free software, released under the BSD 3-Clause licence, and may be redistributed under the terms specified in LICENSE.

isoxya-plugin-spellchecker's People

Contributors

tiredpixel avatar tiredpixel-bot avatar

Stargazers

 avatar

Watchers

 avatar  avatar

isoxya-plugin-spellchecker's Issues

script vs human text

<script> tags or similar JSON data structures appear to be reported as human text and included in the spellchecking.

URL: ee4bc6a9-1082-4dc5-a5da-3d03046f0c8c (REDACTED)

update

Update dependencies, using FROMFREEZE.

crash on large number of misses

Attempting to spellcheck (en) a page with a large number of misses results in a timeout, and crashes the processes. The spellchecking process needs to be investigated for the case where there are a large number of potential misses, and sensible limits put in place to not try to return too much data.

URL: 62accf17-089a-4d08-b5fa-097b5a0aa0a8 (REDACTED)

isx_proc_pick_spellchecker.1.lr8knjis1ny5@vm-prod-isx-1-proc-15061841    | [14/Nov/2019:17:30:17 +0000] fd:17: hClose: resource vanished (Broken pipe)
isx_proc_pick_spellchecker.1.lr8knjis1ny5@vm-prod-isx-1-proc-15061841    | 10.0.6.26 - - [14/Nov/2019:17:30:17 +0000] "POST /data HTTP/1.1" 500 5 - "-"

optimise spellcheck process call

The spellcheck process call is pretty slow. Investigate optimising—potentially considering keeping a process open and dispatching to it—although this creates all sorts of issues regarding multi-language support and multi-page data processing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.