Code Monkey home page Code Monkey logo

php-speller's Introduction

php-speller

PHP spell check library.

Latest Stable Version License Build Pipeline

Currently supported backends:

Installation

With Composer:

$ composer require mekras/php-speller

Usage

  1. Create a text source object from string, file or something else using one of the Mekras\Speller\Source\Source implementations (see Sources below).
  2. Create some speller instance (Hunspell, Ispell or any other implementation of the Mekras\Speller\Speller).
  3. Execute Speller::checkText() method.
use Mekras\Speller\Hunspell\Hunspell;
use Mekras\Speller\Source\StringSource;

$source = new StringSource('Tiger, tigr, burning bright');
$speller = new Hunspell();
$issues = $speller->checkText($source, ['en_GB', 'en']);

echo $issues[0]->word; // -> "tigr"
echo $issues[0]->line; // -> 1
echo $issues[0]->offset; // -> 7
echo implode(',', $issues[0]->suggestions); // -> tiger, trig, tier, tigris, tigress

You can list languages supported by backend:

/** @var Mekras\Speller\Speller $speller */
print_r($speller->getSupportedLanguages());

See examples for more info.

Source encoding

For aspell, hunspell and ispell source text encoding should be equal to dictionary encoding. You can use IconvSource to convert source.

Aspell

This backend uses aspell program, so it should be installed in the system.

use Mekras\Speller\Aspell\Aspell;

$speller = new Aspell();

Path to binary can be set in constructor:

use Mekras\Speller\Aspell\Aspell;

$speller = new Aspell('/usr/local/bin/aspell');

Custom Dictionary

You can use a custom dictionary for aspell. The dictionary needs to be in the following format:

personal_ws-1.1 [lang] [words]

Where [lang] shout be the shorthand for the language you are using (e.g. en) and [words] is the count of words inside the dictionary. Beware that there should no spaces at the end of words. Each word should be listed in a new line.

$aspell = new Aspell();
$aspell->setPersonalDictionary(new Dictionary('/path/to/custom.pws'));

Important

  • aspell allow to specify only one language at once, so only first item taken from $languages argument in Ispell::checkText().

Hunspell

This backend uses hunspell program, so it should be installed in the system.

use Mekras\Speller\Hunspell\Hunspell;

$speller = new Hunspell();

Path to binary can be set in constructor:

use Mekras\Speller\Hunspell\Hunspell;

$speller = new Hunspell('/usr/local/bin/hunspell');

You can set additional dictionary path:

use Mekras\Speller\Hunspell\Hunspell;

$speller = new Hunspell();
$speller->setDictionaryPath('/var/spelling/custom');

You can specify custom dictionaries to use:

use Mekras\Speller\Hunspell\Hunspell;

$speller = new Hunspell();
$speller->setDictionaryPath('/my_app/spelling');
$speller->setCustomDictionaries(['tech', 'titles']);

Ispell

This backend uses ispell program, so it should be installed in the system.

use Mekras\Speller\Ispell\Ispell;

$speller = new Ispell();

Path to binary can be set in constructor:

use Mekras\Speller\Ispell\Ispell;

$speller = new Ispell('/usr/local/bin/ispell');

Important

  • ispell allow to use only one dictionary at once, so only first item taken from $languages argument in Ispell::checkText().

Sources

Sources — is an abstraction layer allowing spellers receive text from different sources like strings or files.

FileSource

Reads text from file.

use Mekras\Speller\Source\FileSource;

$source = new FileSource('/path/to/file.txt');

You can specify file encoding:

use Mekras\Speller\Source\FileSource;

$source = new FileSource('/path/to/file.txt', 'windows-1251');

StringSource

Use string as text source.

use Mekras\Speller\Source\StringSource;

$source = new StringSource('foo', 'koi8-r');

Meta sources

Additionally there is a set of meta sources, which wraps other sources to perform extra tasks.

HtmlSource

Return user visible text from HTML.

use Mekras\Speller\Source\HtmlSource;

$source = new HtmlSource(
    new StringSource('<a href="#" title="Foo">Bar</a> Baz')
);
echo $source->getAsString(); // Foo Bar Baz

Encoding detected via DOMDocument::$encoding.

IconvSource

This is a meta-source which converts encoding of other given source:

use Mekras\Speller\Source\IconvSource;
use Mekras\Speller\Source\StringSource;

// Convert file contents from windows-1251 to koi8-r.
$source = new IconvSource(
    new FileSource('/path/to/file.txt', 'windows-1251'),
    'koi8-r'
);

XliffSource

Loads text from XLIFF files.

use Mekras\Speller\Source\XliffSource;

$source = new XliffSource(__DIR__ . '/fixtures/test.xliff');

Source filters

Filters used internally to filter out all non text contents received from source. In order to save original word location (line and column numbers) all filters replaces non text content with spaces.

Available filters:

php-speller's People

Contributors

caugner avatar cniry avatar dependabot[bot] avatar dergel avatar genericmilk avatar ibarrajo avatar icanhazstring avatar mekras avatar mmetayer avatar olivierpontier avatar peter279k avatar rouliane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

php-speller's Issues

Calling Aspell::getSupportedLanguages() breaks checkText()

When calling getSupportedLanguages() before checkText(), even the arguments are ok, the process command is not updated.

###########################
/vendor/mekras/php-speller/src/ExternalSpeller.php (line 218)
########## DEBUG ##########
[
	(int) 0 => '/usr/bin/aspell',
	(int) 1 => '--encoding=UTF-8',
	(int) 2 => '-a',
	(int) 3 => '--lang=en'
]
###########################
/vendor/mekras/php-speller/src/ExternalSpeller.php (line 224)
########## DEBUG ##########
object(Symfony\Component\Process\Process) {
	[private] callback => null
	[private] hasCallback => false
	[private] commandline => [
		(int) 0 => '/usr/bin/aspell',
		(int) 1 => 'dump',
		(int) 2 => 'dicts'
	]
	[private] cwd => '/var/www/.../trunk/src'
	[private] env => null
	[private] input => null
	[private] starttime => (float) 1601305301.2421
	[private] lastOutputTime => (float) 1601305301.2454
	[private] timeout => (float) 600
	[private] idleTimeout => null
	[private] exitcode => (int) 0
	[private] fallbackStatus => []
	[private] processInformation => [
		'command' => 'exec '/usr/bin/aspell' 'dump' 'dicts'',
		'pid' => (int) 868,
		'running' => false,
		'signaled' => false,
		'stopped' => false,
		'exitcode' => (int) 0,
		'termsig' => (int) 0,
		'stopsig' => (int) 0
	]
	[private] outputDisabled => false
	[private] stdout => resource
	[private] stderr => resource
	[private] process => unknown
	[private] status => 'terminated'
	[private] incrementalOutputOffset => (int) 0
	[private] incrementalErrorOutputOffset => (int) 0
	[private] tty => false
	[private] pty => false
	[private] useFileHandles => false
	[private] processPipes => object(Symfony\Component\Process\Pipes\UnixPipes) {
		pipes => []
		[private] ttyMode => false
		[private] ptyMode => false
		[private] haveReadSupport => true
	}
	[private] latestSignal => null
	[private] sigchild => false
}

Adding $this->resetProcess(); to getSupportedLanguages() fixes it.

Can't open affix or dictionary files for dictionary named "default" - Part 2

Hi mekras!

It's me again ;-)

The "default" problem is happening again. I tried to debug and after a few days of working on this problem i finally found a workaround.

The problem occurs from symfony version 3.3.15 and up (also v4). The changelog mentions following regarding the process:
bug #25567 [Process] Fix setting empty env vars (@nicolas-grekas)
bug #25559 [Process] Dont use getenv(), it returns arrays and can introduce subtle breaks accros PHP versions (@nicolas-grekas)
bug #25417 [Process] Dont rely on putenv(), it fails on ZTS PHP (@nicolas-grekas)

I also debugged your code and it seems that the problem lies in the "- D" option and so on.

The workaround is setting the env variable "LANG" in my apache vhost file directly. But i'd rather have a final solution for this problem. I read something about Dotenv component...

Hopefully you can look into that problem and let me know if i can help you somehow.

thx Stefan

Failed to execute "hunspell -i UTF-8 -a": Can't open affix or dictionary files for dictionary named "default".

Hello!

Sorry to bother you but since v1.6 I get this error.
I'm running on CentOs with International Ispell Version 3.2.06 (but really Hunspell 1.3.2)
After dowgrading to v1.5.1 it works just fine.

Here is my code for execution:
$this->speller = new Hunspell(); $this->speller->setDictionaryPath('.../hunspell/custom'); $this->speller->setCustomDictionaries('fr']);

Maybe it's a problem with Hunspell v1.3.2 but unfortunately I'm not able to upgrade it.

thx in advance

setSupportedLanguages for Hunspell

Its possible to add setter for supported languages property in Hunspell class ?
There is a problem with fetching langs from hunspell library.

Fix incorrect escaping of hunspell command causes an't open affix or dictionary files for dictionary named "default".

Incorrect slashing of hunspell command params, which causes "Can't open affix or dictionary files for dictionary named "default"" on Alpine Docker container.

This example script

$source = new StringSource('Tiger, tigr, burning bright');
$speller = new Hunspell();
$issues = $speller->checkText($source, ['en_US']);

was calling this command:

'hunspell' '-i UTF-8' '-a' '-d en_US'

Which is not correctly parsed by hunspell on Alpine Docker container and throws error Failed to execute "'hunspell' '-i UTF-8' '-a' '-d en_US'": Can't open affix or dictionary files for dictionary named "default".

Executing this command inside docker containter results same error.

Fix prepared in MR #28

Failed to execute "hunspell -D": 'hunspell' is not recognized as an internal or external command, operable program or batch file.

When I execute the following code in Laravel 5.8 ,the error is shown as
in Hunspell.php

` $source = new StringSource('Tiger, tigr, burning bright');
$speller = new Hunspell();
$issues = $speller->checkText($source, ['en_GB', 'en']);

echo $issues[0]->word; // -> "tigr"
echo $issues[0]->line; // -> 1
echo $issues[0]->offset; // -> 7
echo implode(',', $issues[0]->suggestions); `

Failed to execute "hunspell -D": 'hunspell' is not recognized as an internal or external command, operable program or batch file.

I checked over the internet but all went vain.
Yes I have put name space. I try debugging, and i came to know that process isSuccessful is returning false.
Please suggest me something, how can i solve this issue?

HtmlFilter raises Error with malformed HTML tags

Hi,

I encounter the following error TypeError: strcasecmp() expects parameter 2 to be string, null given while I check HTML text with malformed tags like foo/>bar<br><br/>.

This issue comes from here. The error is raised when the name of the tag is null.

Throws runtime exception when loading languages

    public function getSupportedLanguages()
    {
        if (null === $this->supportedLanguages) {
            $process = $this->createProcess('-D');
            $process->run();
            if (!$process->isSuccessful()) {
                throw new \RuntimeException(sprintf('hunspell: %s', $process->getErrorOutput()));
            }

Is throwing an exception when the command is seemingly producing the correct output:

Command output:

.::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/home/vagrant/.openoffice.org/3/user/wordbook:.openoffice.org2/user/wordbook:.openoffice.org2.0/user/wordbook:Library/Spelling:/opt/openoffice.org/basis3.0/share/dict/ooo:/usr/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/share/dict/ooo:/opt/openoffice.org2.2/share/dict/ooo:/usr/lib/openoffice.org2.2/share/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
AVAILABLE DICTIONARIES (path is not mandatory for -d option):
/usr/share/hunspell/en_US
LOADED DICTIONARY:
/usr/share/hunspell/en_US.aff
/usr/share/hunspell/en_US.dic
Hunspell 1.3.2
>

Error output:

-- Error message --

    hunspell: SEARCH PATH:
    .::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/var/empty/.openoffice.org/3/user/wordbook:.openoffice.org2/user/wordbook:.openoffice.org2.0/user/wordbook:Library/Spelling:/opt/openoffice.org/basis3.0/share/dict/ooo:/usr/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/share/dict/ooo:/opt/openoffice.org2.2/share/dict/ooo:/usr/lib/openoffice.org2.2/share/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
    AVAILABLE DICTIONARIES (path is not mandatory for -d option):
    /usr/share/hunspell/en_US
    Can't open affix or dictionary files for dictionary named "default"

PHP8 Support

Hi @mekras
Hi @icanhazstring

would it be possible for you to quickly add PHP8 Support? The package seems to work under PHP8, we'd just need to change "php": "^7.1" to "php": ">=7.1" in the composer.json and release a new version.

Thanks!

no word lists can be found for the language en_us

Hello,

I have installed the php-speller but its nor working. I am getting the error: no word lists can be found for the language en_us.

After searching google I found something that I need to install Aspell-en. Please let me know how can I install the Aspell-en or can solve the issue.

Thanks

Maintaining the repo

Hi @mekras. First off - love this lib for spell checking. Unfortunately you seem to have other priorities.

I would love to take this repo under my care - or at least be able to collaborate as a maintainer.

Greets

Aspell output parsing for Swedish

Hello,

I encounter an error while parsing the output of Aspell for Swedish sentences with words containing a colon (:) - in Swedish and Finnish, words can contain colons, see https://en.wikipedia.org/w/index.php?title=Colon_(punctuation)

here is the output of aspell:

$ aspell --lang=sv --encoding=UTF-8 -a
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.7-20110707)
S:t Petersburg är i Ryssland
& S:t 23 0: St, Set, Sot, Söt, Stl, Stå, Sy, Ät, Åt, Est, Ost, Öst, SI, SM, TT, Sa, Se, Sk, So, Så, Ut, Yt, SJ
? Petersburg 0 4: Peters
*
*
*

The issue comes from here: as there is more than one colon in the line, the $parts contains more than 2 elements, and at line 192, $parts[0][3] is not set, resulting in a notice, and even a fatal error if strict_types is enabled (PHP Fatal error: Uncaught TypeError: trim() expects parameter 1 to be string, null given).

I guess a regexp would be more suitable in this case, I can try to do a PR if needed (though I'm not sure how to implement the tests yet...)

"$speller->checkText($source, $languages)" will cache "$languages" for all subsequent requests

Hi, I found the following error:

$speller =  new ExternalSpeller(...); // In the example I use an abstract class because the error is here
$speller->checkText($source, ['en']);
// the command "hunspell -i UTF-8 -a -d en_US" will be generated
$speller->checkText($source, ['uk']);
// "hunspell -i UTF-8 -a -d en_US" will also be generated and hunspell will ignore the language change

The problem exist in: mekras/php-speller/src/ExternalSpeller.php method "composeProcess":

    private function composeProcess(array $command): Process
    {
        if ($this->process === null) {
            $this->process = new Process($command);
        }

        $this->process->setTimeout($this->timeout);

        return $this->process;
    }

because it caches the "process" and does not use the existing "resetProcess" method to clear them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.