Code Monkey home page Code Monkey logo

ruby-stemmer's People

Contributors

aurelian avatar dvisockas avatar fivedigit avatar guilhermesimoes avatar munshkr avatar tenderlove avatar yury avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ruby-stemmer's Issues

version.rb missing from 0.9.4 release

Hi,

It seems the version.rb file is missing from the latest release (0.9.4) causing this sort of thing:

be rake tmp:clear --trace
rake aborted!
LoadError: cannot load such file -- lingua/version

Confirmed by downloading the gem then:

$ gem unpack ruby-stemmer-0.9.4.gem
$ cd ruby-stemmer-0.9.4
$ tree lib
lib
└── lingua
    └── stemmer.rb

Thanks!

make and FreeBSD

When building the gem on FreeBSD, the gem fails because it uses 'make' instead of 'gmake'. Using 'gmake' instead makes it work just fine.

Use uname as a test to see which operating system it is.

Stemming single-item array should yield an array

Hi, I think this behaviour is inconsistent:

Lingua.stemmer(["installation"])
"instal" # string
Lingua.stemmer(["installation", "installation"])
["instal", "instal"] # not a string!

It makes it hard to treat arbitrary strings as some post-processing is necessary to neutralise the return data type. I would suggest the first example should yield ["instal"].

jruby support

Is it possible to add jruby support with minimal changes?

Bundler uses :require, not :lib

gem 'ruby-stemmer', '>=0.8.3', :lib => 'lingua/stemmer' should be

gem 'ruby-stemmer', '>=0.8.3', :require => 'lingua/stemmer'

Ruby 1.9 and encoding

In ruby 1.9, by default the result of stem is encoded with ASCII-8BIT, this should be changed to use the specified encoding (or maybe the default string encoding).

How to reproduce:

# encoding: utf-8
require 'lingua/stemmer'
s= Lingua::Stemmer.new(:language => "ro")
result = s.stem("așezare")
puts "test".encoding # => UTF-8
puts s.encoding # => UTF_8
puts result.encoding #=> ASCII-8BIT

Workaround:

result.force_encoding "utf-8"
puts result.encoding #=> "UTF-8"

[Windows] Can't install stemmer by any means.

I've tried all the three installations for Windows, but it is still the same.

Temporarily enhancing PATH for MSYS/MINGW...
Building native extensions. This could take a while...
ERROR:  Error installing ruby-stemmer:
        ERROR: Failed to build gem native extension.

    current directory: C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/ruby-stemmer-3.0.0/ext/lingua        
C:/Ruby27-x64/bin/ruby.exe -I C:/Ruby27-x64/lib/ruby/2.7.0 -r ./siteconf20201206-11216-1swnql7.rb extconf.rb
The filename, directory name, or volume label syntax is incorrect.
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
        --with-opt-dir
        --without-opt-dir
        --with-opt-include
        --without-opt-include=${opt-dir}/include
        --with-opt-lib
        --without-opt-lib=${opt-dir}/lib
        --with-make-prog
        --without-make-prog
        --srcdir=.
        --curdir
        --ruby=C:/Ruby27-x64/bin/$(RUBY_BASE_NAME)

extconf failed, exit code 1

Gem files will remain installed in C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/ruby-stemmer-3.0.0 for inspection.
Results logged to C:/Ruby27-x64/lib/ruby/gems/2.7.0/extensions/x64-mingw32/2.7.0/ruby-stemmer-3.0.0/gem_make.out

Mac OS X installation with/without ARCHFLAGS + rvm not working

Hi.
Compilation on mac with rvm is still not working right.
There's 2 problems with the following code.


  1. It does not allow passing in ARCHFLAGS. Instead, it executes the line whenever ARCHFLAGS is set. I don't think that's the desired behavior.

To test it, make sure ARCHFLAGS is set first, e.g
ARCHFLAGS="-arch x86_64" gem install ruby-stemmer

  1. The second issue is that the regex part below does not match anything on my system because there's nothing after 'executable'. This cause the error others have reported.
    /executable (.+)$/

The result of file on Mac OS X 10.6.7 with rvm 1.6.5 is
/Users/mmullis/.rvm/rubies/ruby-1.9.2-p180/bin/ruby: Mach-O 64-bit executable
/Users/mmullis/.rvm/rubies/ruby-1.8.7-p334/bin/ruby: Mach-O 64-bit executable

This is why we need to specify ARCHFLAGS and not rely on the detection code.

If you're geting something after "executable" for the last one, there must be a difference in either how the ruby is getting built or the signature files that libmagic uses.

thanks,
michael.

Mac OS X installation for 1.9.2 (rvm)

for 0.8.2 Mac OS X installation fails using

gem install ruby-stemmer

in case of badly compiled libstemmer.o

you have two bugs in extconf.rb where you attempt to determine arch for macs

I haven't tried to fix determination logic but fixed the mistake which blocked ARCH forced by external param. You need to change

unless ENV['ARCHFLAGS'].nil?

to

if ENV['ARCHFLAGS'].nil?

this allows you to start process correctly via ARCHFLAGS='-arch x86_64' gem install ruby-stemmer

but to finish it ok you need to use ranlib for libstemmer.o

so, I've added

if RUBY_PLATFORM =~ /darwin/
system "ranlib #{File.expand_path(File.join(LIBSTEMMER, 'libstemmer.o'))}"
end

after "#{make} libstemmer.o" case and built with this fixes gem was successfully installed under rvmed 1.9.2 ruby on Mac OS X 10.6 via "ARCHFLAGS='-arch x86_64' gem install ruby-stemmer" command

please investigate, and probably add these fixes to next version

This stemmer doesn't support spanish language

Poof of concept:

stemmer = Lingua::Stemmer.new(:language => 'spanish', :encoding => 'UTF-8')
stemmer.stem('piano')
=> "pian"

Creo que entiendes español ya que vives en Barcelona, creo que lo que está fallando es el port interno, o la parte que viene escrita en C

Open Solaris and ar paths

OpenSolaris:

ar: libstemmer.o not in archive format.

To make it work:

export CC=/usr/bin/gcc
cd /opt/src/
git clone git://github.com/aurelian/ruby-stemmer.git
cd ruby-stemmer
/opt/bin/ruby extconf.rb
cd libstemmer_c
make

ar -cru libstemmer_foo.o stem_ISO_8859_1_danish.o stem_UTF_8_danish.o stem_ISO_8859_1_dutch.o stem_UTF_8_dutch.o stem_ISO_8859_1_english.o stem_UTF_8_english.o stem_ISO_8859_1_finnish.o stem_UTF_8_finnish.o stem_ISO_8859_1_french.o stem_UTF_8_french.o stem_ISO_8859_1_german.o stem_UTF_8_german.o stem_ISO_8859_1_hungarian.o stem_UTF_8_hungarian.o stem_ISO_8859_1_italian.o stem_UTF_8_italian.o stem_ISO_8859_1_norwegian.o stem_UTF_8_norwegian.o stem_ISO_8859_1_porter.o stem_UTF_8_porter.o stem_ISO_8859_1_portuguese.o stem_UTF_8_portuguese.o stem_ISO_8859_2_romanian.o stem_UTF_8_romanian.o stem_KOI8_R_russian.o stem_UTF_8_russian.o stem_ISO_8859_1_spanish.o stem_UTF_8_spanish.o stem_ISO_8859_1_swedish.o stem_UTF_8_swedish.o stem_UTF_8_turkish.o api.o utilities.o libstemmer.o

cp libstemmer_foo.o libstemmer.o
cd ..
make
/opt/bin/ruby test.rb
make install

Change the project layout

  1. Use jewler to build and maintain the library
  2. Move project from NRR to it's own home on rubyforge
  3. Create a new layout for the library (ext/ lib/ test/)

libstemmer_c build is broken on Ubuntu

Hi,

make clean all fails with with log as follows:

rm -f stemwords *.o src_c/*.o runtime/*.o libstemmer/*.o
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_latin.o src_c/stem_UTF_8_latin.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_danish.o src_c/stem_UTF_8_danish.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_dutch.o src_c/stem_UTF_8_dutch.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_english.o src_c/stem_UTF_8_english.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_finnish.o src_c/stem_UTF_8_finnish.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_french.o src_c/stem_UTF_8_french.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_german.o src_c/stem_UTF_8_german.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_hungarian.o src_c/stem_UTF_8_hungarian.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_italian.o src_c/stem_UTF_8_italian.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_norwegian.o src_c/stem_UTF_8_norwegian.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_porter.o src_c/stem_UTF_8_porter.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_portuguese.o src_c/stem_UTF_8_portuguese.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_romanian.o src_c/stem_UTF_8_romanian.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_russian.o src_c/stem_UTF_8_russian.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_spanish.o src_c/stem_UTF_8_spanish.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_swedish.o src_c/stem_UTF_8_swedish.c
cc -Iinclude -fPIC    -c -o src_c/stem_UTF_8_turkish.o src_c/stem_UTF_8_turkish.c
cc -Iinclude -fPIC    -c -o runtime/api.o runtime/api.c
cc -Iinclude -fPIC    -c -o runtime/utilities.o runtime/utilities.c
cc -Iinclude -fPIC    -c -o libstemmer/libstemmer_utf8.o libstemmer/libstemmer_utf8.c
ar -cru libstemmer.o src_c/stem_UTF_8_latin.o src_c/stem_UTF_8_danish.o src_c/stem_UTF_8_dutch.o src_c/stem_UTF_8_english.o 
src_c/stem_UTF_8_finnish.o src_c/stem_UTF_8_french.o src_c/stem_UTF_8_german.o src_c/stem_UTF_8_hungarian.o src_c/stem_UTF_8
_italian.o src_c/stem_UTF_8_norwegian.o src_c/stem_UTF_8_porter.o src_c/stem_UTF_8_portuguese.o src_c/stem_UTF_8_romanian.o 
src_c/stem_UTF_8_russian.o src_c/stem_UTF_8_spanish.o src_c/stem_UTF_8_swedish.o src_c/stem_UTF_8_turkish.o runtime/api.o ru
ntime/utilities.o libstemmer/libstemmer_utf8.o
cc -o stemwords examples/stemwords.o libstemmer.o
examples/stemwords.o: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make: *** [stemwords] Error 1

on Ubuntu

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

When I updated libstemmer_c to the one from https://github.com/snowballstem/snowball/tree/master, everything compiled and works as expected.

P.S. I updated everything except latin algorithm. master version produces noun-form and verb-form as it stated at https://lists.tartarus.org/mailman/private/snowball-discuss/2017-June/001613.html . Since you hold outdated version of libstemmer_c, I suspect latin is from older version (that returned noun-form only) without your extra edits, isn't it?!

Could you possibly properly update your libstemmer_c version to snowball's master?

Compile problems on OSX

On OSX Snow Leopard and Ruby-1.8.7 p72 compilation fails with the following message:

ld: in /Users/christian/.rvm/gems/ruby/1.8.7/gems/ruby-stemmer-0.6.4/libstemmer_c/libstemmer.o, archive has no table of contents
collect2: ld returned 1 exit status
make: *** [stemmer_native.bundle] Error 1

The compile commands:
make
gcc -I. -I/Users/christian/.rvm/ruby-1.8.7-tv1_8_7_72/lib/ruby/1.8/i686-darwin10.0.0 -I/Users/christian/.rvm/ruby-1.8.7-tv1_8_7_72/lib/ruby/1.8/i686-darwin10.0.0 -I. -DHAVE_LIBSTEMMER_H -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch x86_64 -pipe -fno-common -I/Users/christian/.rvm/gems/ruby/1.8.7/gems/ruby-stemmer-0.6.4/libstemmer_c/include -c stemmer.c
cc -dynamic -bundle -undefined suppress -flat_namespace -o stemmer_native.bundle stemmer.o -L. -L/Users/christian/.rvm/ruby-1.8.7-tv1_8_7_72/lib -L. -Wl,-syslibroot /Developer/SDKs/MacOSX10.6.sdk -arch x86_64 -L/Users/christian/.rvm/gems/ruby/1.8.7/gems/ruby-stemmer-0.6.4/libstemmer_c /Users/christian/.rvm/gems/ruby/1.8.7/gems/ruby-stemmer-0.6.4/libstemmer_c/libstemmer.o -ldl -lobjc

The Sphinx makefile is able to compile libstemmer_c without problems on the same machine.

Fail to build gem on MacOS 10.4 and MacOS 10.5

Old versions on MacOS X will fail to build the gem as ARCHFLAGS will report x86_64.

Output from 10.4:

[...]
cc -Iinclude -fPIC -arch x86_64   -c -o runtime/utilities.o runtime/utilities.c
cc -Iinclude -fPIC -arch x86_64   -c -o libstemmer/libstemmer.o libstemmer/libstemmer.c
[...] libstemmer/libstemmer.o
checking for libstemmer.h... yes
creating Makefile
 
make
gcc -I. -I/W/lib/ruby/1.8/i686-darwin8.11.1 -I/W/lib/ruby/1.8/i686-darwin8.11.1 -I. -DHAVE_LIBSTEMMER_H  -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE   -fno-common -g -O2 -pipe -fno-common  -I/W/lib/ruby/gems/1.8/gems/ruby-stemmer-0.6.3/libstemmer_c/include   -c stemmer.c
cc -dynamic -bundle -undefined suppress -flat_namespace -o stemmer_native.bundle stemmer.o -L. -L/W/lib -L.     -L/W/lib/ruby/gems/1.8/gems/ruby-stemmer-0.6.3/libstemmer_c /W/lib/ruby/gems/1.8/gems/ruby-stemmer-0.6.3/libstemmer_c/libstemmer.o  -lpthread -ldl -lobjc  
/usr/bin/ld: truncated or malformed archive: /W/lib/ruby/gems/1.8/gems/ruby-stemmer-0.6.3/libstemmer_c/libstemmer.o (ranlib structures in table of contents extends past the end of the table of contents, can't load from it)
collect2: ld returned 1 exit status
make: *** [stemmer_native.bundle] Error 1

Possible solution will be to detect ruby arch with file which ruby.

Gem installation failing on Windows 10

Using
gem install ruby-stemmer
on Windows 10 returns this error code:

Microsoft Windows [Version 10.0.19043.1586]

C:\Users\username>gem install ruby-stemmer
Temporarily enhancing PATH for MSYS/MINGW...
Building native extensions. This could take a while...
ERROR:  Error installing ruby-stemmer:
        ERROR: Failed to build gem native extension.

    current directory: C:/Ruby30-x64/lib/ruby/gems/3.0.0/gems/ruby-stemmer-3.0.0/ext/lingua
C:/Ruby30-x64/bin/ruby.exe -I C:/Ruby30-x64/lib/ruby/3.0.0 -r ./siteconf20220316-10816-o9w6w1.rb extconf.rb
The filename, directory name, or volume label syntax is incorrect.
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
        --with-opt-dir
        --without-opt-dir
        --with-opt-include
        --without-opt-include=${opt-dir}/include
        --with-opt-lib
        --without-opt-lib=${opt-dir}/lib
        --with-make-prog
        --without-make-prog
        --srcdir=.
        --curdir
        --ruby=C:/Ruby30-x64/bin/$(RUBY_BASE_NAME)

extconf failed, exit code 1

Gem files will remain installed in C:/Ruby30-x64/lib/ruby/gems/3.0.0/gems/ruby-stemmer-3.0.0 for inspection.
Results logged to C:/Ruby30-x64/lib/ruby/gems/3.0.0/extensions/x64-mingw32/3.0.0/ruby-stemmer-3.0.0/gem_make.out

None of the above configuration options resolves the issue.

The "gem_make.out" file contains the same error code.

Trying the recommended Windows options
gem install ruby-stemmer --platform=x86-mingw32
and
gem install ruby-stemmer --platform=x86-mswin32
returns the same error.

My ruby version is
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x64-mingw32]

Bulgarian stemmer

Hi,

I have the rules for bulgarian stemmer.
Don't really have the resources to put into coding this in C.
I can give the rules if someone is willing to write it in C. They are the work of P.Nakov.

Regards,
Yavor

libstemmer_c to build using the same ARCH as ruby

Currently, the bundled library libstemmer_c fails to build using the same architecture as the ruby lib was build.

Somehow, in the libstemmer Makefile, we need to detect for what ARCH ruby was build and use the same for it.

a gist

RubyNLP

Dear Aurelian,

I've recently added your project to our RubyNLP list: https://github.com/arbox/nlp-with-ruby

I wonder if you want to participate in the Ruby for NLP network. You could do this in a very simple step by adding the rubynlp topic to your GitHub repository.

Thank you for the project!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.