kimiamania / mitlm Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 1.25 MB

License: BSD 3-Clause "New" or "Revised" License

Shell 0.90% C++ 69.21% C 0.46% Fortran 29.44%

mitlm's People

Contributors

Watchers

mitlm's Issues

estimate-ngram crashes on 4-gram modeling

Hello,

i tried to create an 4-gram language model with the help of your estimate-ngram 
tool which led to the following debug output:

0.000   Loading vocab wlist...
0.170   Loading corpus corpus.txt...
estimate-ngram: src/vector/DenseVector.tcc:406: void 
DenseVector<T>::_allocate() [with T = int]: Assertion `_data' failed.

I used the command:
estimate-ngram -order 4 -v wlist -unk -t corpus.txt -wl arpa

When i try to create a trigram model from the same corpus, the tool runs the 
task like a charm.

Original issue reported on code.google.com by [email protected] on 18 Jun 2010 at 8:47

Error loading arpa format lm

What steps will reproduce the problem?
1. Try to load arpa lm using evaluate-lm

I have tried with ngrams estimated using both mitlm as well as other tool sets.

What is the expected output? What do you see instead?

0.000   Loading LM exp/arpa/imd/imd.p7E-8.arpa...
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Unexpected file format.

What version of the product are you using? On what operating system?

Used both latest SVN trunk and 0.4.1 on Ubuntu Linux

Please provide any additional information below.

The problem seems to be on line 293 of NgramModel.cpp

"if (sscanf(line, "\\%u-ngrams:", &i) != 1 || i != o) {"

On line 382 we see the problem, when writing out the n-gram:
"fprintf(lmFile, "\n\\%lu-grams:\n", (unsigned long)o);"

Thus the fix is fairly straight forward, just change line 293 to:
"if (sscanf(line, "\\%u-grams:", &i) != 1 || i != o) {"

Original issue reported on code.google.com by [email protected] on 18 May 2014 at 9:52

Very large number of "Feature skipped" error messages

When I run the following command I get many  "Feature skipped" error
messages. I previously populated the effcounts files with estimate-ngram -t
-wec for each model, and they look fine. What do these error messages mean?
I also tried this with CM, and got many of the Feature skipped messages,
and a "feature read from..." for each model.

interpolate-ngram -lm "model1.lm, model2.lm, model3.lm" -interpolation GLI
-op dev.txt -wl GLI.lm -if 
"log:sumhist:%s.effcounts,pow2:log1p:sumhist:%s.effcounts"


Thanks

Original issue reported on code.google.com by [email protected] on 26 Feb 2009 at 11:27

Where is Lectures.txt?

I'm trying to follow the tutorial at 
https://code.google.com/p/mitlm/wiki/Tutorial, and want to make sure I'm doing 
it right. Where can I find the sample files like lectures.txt?

Original issue reported on code.google.com by [email protected] on 14 Apr 2015 at 10:59

Different perplexity results from estimate-ngram and evaluate-ngram

What steps will reproduce the problem?
1. Create an lm with evaluate-ngram and eval-perp param
2. Use estimate-gram with eval-perp on the same LM
3. Perplexity results differ

What is the expected output? What do you see instead?


evaluate-ngram -lm rlst8-similar.lm -eval-perp "$TRANSCRIPT_CONT, 
$TRANSCRIPT_SENT"
0.001   Loading LM rlst8-similar.lm...
7.262   Perplexity Evaluations:
7.262   Loading eval set 
/data/src/sphinx/experiments/transcripts/rlst-transcript.corpus...
7.318       /data/src/sphinx/experiments/transcripts/rlst-transcript.corpus 385.071
7.322   Loading eval set 
/data/src/sphinx/experiments/transcripts/rlst-transcript.sentences...
7.376       /data/src/sphinx/experiments/transcripts/rlst-transcript.sentences  312.22
4

$  estimate-ngram -unk 1 -vocab $VOCAB_AUGMENTED -text $SENTENCE_CORPUS -wl 
$LM_SIMILAR -eval-perp "$TRANSCRIPT_CONT, $TRANSCRIPT_SENT"
0.001   Replace unknown words with <unk>...
0.001   Loading vocab rlst8-merged-vocab.txt...
0.013   Loading corpus sentences.similar.corpus...
    10.127  Smoothing[1] = ModKN
10.127  Smoothing[2] = ModKN
10.127  Smoothing[3] = ModKN
10.127  Set smoothing algorithms...
10.243  Estimating full n-gram model...
10.459  Saving LM to rlst8-similar.lm...
14.192  Perplexity Evaluations:
14.192  Loading eval set 
/data/src/sphinx/experiments/transcripts/rlst-transcript.corpus...
14.351      /data/src/sphinx/experiments/transcripts/rlst-transcript.corpus 377.913
14.359  Loading eval set 
/data/src/sphinx/experiments/transcripts/rlst-transcript.sentences...
14.516      /data/src/sphinx/experiments/transcripts/rlst-transcript.sentences  307.0
90

I would expect the two sets of perplexity results to be the same.

The difference appears to arise from use of the "-unk" parameter. Without these 
(i.e. LM excludes <unk>), the perplexity results from estimate-ngram and 
evaluate-ngram are the same.

What version of the product are you using? On what operating system?

r48

MacOS X 10.6.1

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 4 Jun 2011 at 7:42

Warning during interpolation: "Search direction is not a descent direction"

Hi Paul,

I've experienced some problems with the linear interpolation process and I am 
not sure how to solve the problem. When I try to interpolate a 3-component LM I 
get the warning:

--------
THE SEARCH DIRECTION IS NOT A DESCENT DIRECTION

IFLAG= -1
LINE SEARCH FAILED. SEE DOCUMENTATION OF ROUTINE MCSRCH
ERROR RETURN OF LINE SEARCH: INFO=  0
POSSIBLE CAUSES: FUNCTION OR GRADIENT ARE INCORRECT
OR INCORRECT TOLERANCES
--------

What I did was creating three single language models using

estimate-ngram -order 3 -v wlist -unk true -t train1.txt -opt-perp opt1.txt -wl 
arpa_a.gz
estimate-ngram -order 3 -v wlist -unk true -t train2.txt -opt-perp opt2.txt -wl 
arpa_b.gz
estimate-ngram -order 3 -v wlist -unk true -t train3.txt -opt-perp opt3.txt -wl 
arpa_c.gz

then unpacked the components

gzip -d -f arpa_a.gz
gzip -d -f arpa_b.gz
gzip -d -f arpa_c.gz

then interpolated using

interpolate-ngram -l "arpa_a,arpa_b,arpa_c" -opt-perp int-opt.txt -wl arpa_full

The process actually yields in a language model I can use afterwards but I am 
not sure what the warning/error is about and what I have to do to fix this.

My system is a Linux 2.6.31.12-0.2-desktop x86_64 with 8 GB ram and a quad-core 
AMD 2360SE

Thanks in advance!

Original issue reported on code.google.com by [email protected] on 25 Oct 2010 at 12:29

cutoff

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 23 Dec 2014 at 12:16

models with <unk>

on the tutorial wiki page (http://code.google.com/p/mitlm/wiki/Tutorial) it
is written that a language model with <unk> symbol for out-of-vocabulary
words can be estimated with this command:

estimate-ngram -v CS.vocab -unk -t Lectures.txt -wl Lectures.CS.unk.lm

but that does not work. You have to add T,t,1,TRUE or true after -unk:

estimate-ngram -v CS.vocab -unk true -t Lectures.txt -wl Lectures.CS.unk.lm

Original issue reported on code.google.com by [email protected] on 24 Feb 2010 at 3:30

UTF-8 support

Hi, I would like to train a LM using mitlm, is it possible to use UTF-8 encoded 
data? I'm also interested whether it is possible to invoke case-insensitive 
handling of data? Thanks for answers. Jan

Original issue reported on code.google.com by [email protected] on 28 Sep 2010 at 9:56

Merged into: #16

interpolating custom language model with lm_giga_5k_nvp_3gram.arpa fails

My Input:

interpolate-ngram -lm "test_model.lm, lm_giga_5k_nvp_3gram.arpa" -wl 
combined_lang_model.lm -verbose

test_model.lm is one I created. It interpolates fine by itself. 
lm_giga_5k_nvp_3gram.arpa does not work even if you interpolate it by itself.

What version of the product are you using? On what operating system?
version 48

Please provide any additional information below.

interpolate-ngram: src/NgramModel.cpp:329: void 
NgramModel::LoadLM(std::vector<DenseVector<double>, 
std::allocator<DenseVector<double> > >&, std::vector<DenseVector<double>, 
std::allocator<DenseVector<double> > >&, ZFile&): Assertion `p >= 
&line[lineLen]' failed.
Aborted

Original issue reported on code.google.com by [email protected] on 14 Jul 2011 at 4:21

Merged into: #27

unicode input to mitlm

Dear Sir,

I want to make some request.

Can you modify mitlm to take unicode files  as input?

Original issue reported on code.google.com by [email protected] on 22 Jan 2010 at 6:07

nan output when using -op devset

1) 
set option = "-i LI -opt-alg LBFGSB "
interpolate-ngram  "$trainingdata,$adaptdata" $option -wl $trigram  -op $devset 
-eval-perp $testset
2)
set option = "-i CM -opt-alg LBFGSB "
interpolate-ngram  -c "$count21,$count22" $option 
-if "log:sumhist:$effcount21;log:sumhist:$effcount22" -wl $trigram -op $devset 
-eval-perp $testset

Both of the above methods create many nan backoffs in the output LM.
However, their perplexities seems OK.
If the -op $devset is not used, the nan is not created. But the perplexities of 
"CM" and "GLI" are over double of the "LI"

What version of the product are you using? On what operating system?
mit0.4, in CenOS 4.7

Original issue reported on code.google.com by [email protected] on 10 Nov 2010 at 8:15

maximum line length

What steps will reproduce the problem?
1. Run estimate-ngram on an input text file with lines longer than 4096 
characters, where the 4096th character is in the middle of a word.
2. Check the LM file for a partial words created by splitting the word above.
3.

What is the expected output? What do you see instead?

In a very long line containing e.g. the word "defect', where "c" is the 4096th 
character, the non-words "def" and "ect" appear in the LM.


What version of the product are you using? On what operating system? 0.4.1 on 
Ubuntu 12.04


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 18 Jul 2014 at 7:28

Cannot run estimate-ngram without --read-wfeatures

Whn running:
estimate-ngram -read-text tmp.txt --write-binary-lm tmp.mitml

Loading corpus tmp.txt...
Smoothing[1] = ModKN
Smoothing[2] = ModKN
Smoothing[3] = ModKN
Set smoothing algorithms...
Segmentation fault (core dumped)

When I add -w tmp.wfeatures, where tmp.wfeatures is empty file, it works OK.


I'm using mitlm from SVN under Linux, amd64.

Original issue reported on code.google.com by [email protected] on 3 Dec 2008 at 1:53

Cannot create optimized unigrams

When creating unigram LMs and using word features and trying to optimize on
a dev set, I get segmentation fault:

$ estimate-ngram -v etc/vocab -unk 1 -o 1 -t train.txt -wl tmp.arpa.gz -wf
entropy:%s.txt -op tmp.txt
Replace unknown words with <unk>...
Loading vocab etc/vocab...
Loading corpus train.txt...
Loading weight features entropy:train.txt...
Smoothing[1] = ModKN
Set smoothing algorithms...
Loading development set tmp.txt...
Segmentation fault


The same line works fine with -o 2.
I'm using MITLM from SVN.

Original issue reported on code.google.com by [email protected] on 26 Feb 2009 at 2:13

Obsolete reference to Boost libraries in documentation

What steps will reproduce the problem?

The project summary page at http://code.google.com/p/mitlm/ mentions a 
dependency on the Boost C++ libraries, but mitlm no longer seems to require 
these.

What is the expected output? What do you see instead?

Reference to Boost should be removed from project documentation.

What version of the product are you using? On what operating system?

r48 (trunk)

Original issue reported on code.google.com by [email protected] on 8 Mar 2011 at 12:14

Compiling on Mac OS X Lion fails

What steps will reproduce the problem?
1. checkout revision 48
2. copy Makefile.example to Makefile
3. compile with "make -j"

What is the expected output? What do you see instead?

I expected a successful compilation. Instead I get a failed compilation with 
non-zero return code

What version of the product are you using? On what operating system?

I'm using a clean checkout of revision 48 of mitlm. I'm compiling it on Mac OS 
X Lion (10.7).

Please provide any additional information below.

I manage to solve the problem by performing the following steps:

1. Install Fortran with Homebrew: "brew install gfortran"
2. Add this setting to your Makefile after the FFLAGS line: "FC = gfortran"
3. Change the LDFLAGS line to "LDFLAGS  = -L. -lgfortran -lmitlm"
4. Create a symlink to your libgfortran library: "ln -s 
/usr/local/Cellar/gfortran/4.2.4-5666.3/lib/gcc/i686-apple-darwin11/4.2.1/x86_64
/libgfortran.a"

Once I did this, I was able to compile mitlm with no errors.

I haven't yet used the binaries for anything, but I noticed that running the 
estimate-ngram command without any input produces a segmentation fault:

$ ./interpolate-ngram 
Interpolating component LMs...
Tying parameters across n-gram order...
Segmentation fault: 11

Since I've never used it before, I'm not sure if this just me.

Original issue reported on code.google.com by [email protected] on 25 Aug 2011 at 2:37

compilation error because an included file "unordered_map" is not found

What steps will reproduce the problem?
1. configure
2. make
3.

What is the expected output? What do you see instead?
Expected Output: compilation with no errors.
Instead: I see "fatal error:  'tr1/unordered_map' file not found

What version of the product are you using? On what operating system?
Latest version (0.4.1), on Mac OSX 10.9.2


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 10 Apr 2014 at 4:33

build fails due to ambiguous call

What steps will reproduce the problem?
1.
$ make
……
libtool: compile:  g++ "-DPACKAGE_NAME=\"MIT Language Modeling Toolkit\"" 
-DPACKAGE_TARNAME=\"mitlm\" -DPACKAGE_VERSION=\"0.4.1\" "-DPACKAGE_STRING=\"MIT 
Language Modeling Toolkit 0.4.1\"" 
-DPACKAGE_BUGREPORT=\"[email protected]\" -DPACKAGE_URL=\"\" 
-DPACKAGE=\"mitlm\" -DVERSION=\"0.4.1\" "-DF77_FUNC(name,NAME)=name ## _" 
"-DF77_FUNC_(name,NAME)=name ## _" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_STRING_H=1 -DHAVE_MATH_H=1 
-DSTDC_HEADERS=1 -DHAVE_STDLIB_H=1 -DHAVE_MALLOC=1 -DHAVE_STDLIB_H=1 
-DHAVE_REALLOC=1 -I. -I./src -g -O2 -MT src/NgramModel.lo -MD -MP -MF 
src/.deps/NgramModel.Tpo -c src/NgramModel.cpp  -fno-common -DPIC -o 
src/.libs/NgramModel.o
src/NgramModel.cpp: In member function 'void 
mitlm::NgramModel::LoadLM(std::vector<mitlm::DenseVector<double>, 
std::allocator<mitlm::DenseVector<double> > >&, 
std::vector<mitlm::DenseVector<double>, 
std::allocator<mitlm::DenseVector<double> > >&, mitlm::ZFile&)':
src/NgramModel.cpp:325: error: call of overloaded 'pow(int, int)' is ambiguous
/usr/include/math.h:436: note: candidates are: double pow(double, double)
/usr/include/c++/4.2.1/cmath:357: note:                 float std::pow(float, 
float)
/usr/include/c++/4.2.1/cmath:361: note:                 long double 
std::pow(long double, long double)
/usr/include/c++/4.2.1/cmath:365: note:                 double std::pow(double, 
int)
/usr/include/c++/4.2.1/cmath:369: note:                 float std::pow(float, 
int)
/usr/include/c++/4.2.1/cmath:373: note:                 long double 
std::pow(long double, int)
make[1]: *** [src/NgramModel.lo] Error 1
make: *** [all-recursive] Error 1
$

I did in this way.
325c325
<                     assert(prob <= std::pow(10, -99));

---
>                     assert(prob <= std::pow(10.0, -99));

Original issue reported on code.google.com by mamadontgodaddycomehome on 21 Nov 2013 at 10:52

smoothing failed when saving

Hi, I have problem with smoothing large files of 3-grams counts.
I use :estimate-ngram -order 3 -counts allgrams -smoothing FixModKN -wl 
allgrams.FixModKN.lm command and i get this error:

Saving LM to train.corpus.lm...
estimate-ngram: src/NgramModel.cpp:422: void NgramModel::SaveLM(const 
std::vector<DenseVector<double>, std::allocator<DenseVector<double> > >&, const 
std::vector<DenseVector<double>, std::allocator<DenseVector<double> > >&, 
ZFile&) const: Assertion `(size_t)(ptr - lineBuffer.data()) < 
lineBuffer.size()' failed.

Before I tried on 2-grams with 4,7GB files and it works fine. 3-grams file is 
20GB big.

My operating system is GNU/Linux x86_64 with 96GB RAM

Original issue reported on code.google.com by [email protected] on 15 Nov 2012 at 4:48

./.libs/libmitlm.so: undefined reference to `__cxa_get_exception_ptr'

What steps will reproduce the problem?
1../autogen.sh
2.make

Error messages:
src/NgramModel.cpp:325: error: call of overloaded 'pow(int, int)' is ambiguous
/usr/include/bits/mathcalls.h:154: note: candidates are: double pow(double, 
double)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/cmath:345: 
note:                 float std::pow(float, float)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/cmath:349: 
note:                 long double std::pow(long double, long double)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/cmath:353: 
note:                 double std::pow(double, int)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/cmath:357: 
note:                 float std::pow(float, int)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/cmath:361: 
note:                 long double std::pow(long double, int)
make[1]: *** [src/NgramModel.lo] Error 1
make[1]: Leaving directory `/home/wyz/lm/mitlm-0.4.1'
make: *** [all-recursive] Error 1

Solution
Changed src/NgramModel.cpp:325 from 
assert(prob <= std::pow(10, -99));

to 
assert(prob <= std::pow(10.0, -99.0));

Again, rum `make'

Error message

make[1]: *** No rule to make target `lbfgs.lo', needed by `libmitlm.la'.  Stop.
make[1]: Leaving directory `/home/wyz/lm/mitlm-0.4.1'
make: *** [all-recursive] Error 1

Solution,
according to http://code.google.com/p/mitlm/issues/detail?id=26
Change Makefile and Makefile.in

Error message

./.libs/libmitlm.so: undefined reference to `__cxa_get_exception_ptr'
collect2: ld returned 1 exit status
make[1]: *** [evaluate-ngram] Error 1
make[1]: Leaving directory `/home/wyz/lm/mitlm-0.4.1'
make: *** [all-recursive] Error 1

Then I have no idea about how to solve this problem.


What version of the product are you using? On what operating system?
mitlm-0.4.1.tar.gz
Description:    Red Hat Enterprise Linux Server release 5.8 (Tikanga)
autoconf (GNU Autoconf) 2.69 (the original version is 2.59)

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 16 Jun 2014 at 4:34

interpolate-ngram: mixing lms with numerical values is not implemented or not?

I have several arpa-formatted language models, and I'd like to mix these LMs 
with a list of numerical weights (proportion) using interpolate-ngram, e.g., 
[0.3, 0.4, 0.1, 0.2]. I have been looking in the tutorial but could not find an 
option for this. Is that implemented yet? Is there any alternative way to do 
this (mix LMs with numerical weights)?

What version of the product are you using? On what operating system?
0.4.1, Linux 3.13.0-39-generic

Thanks!

Original issue reported on code.google.com by [email protected] on 19 May 2015 at 11:18

LM building error in makefile

error after make command
i want to develop language model but following error occurs
checke attached make file
mitlm using latest version from svn repository
ubuntu 12.04 32bit system

libtool: link: /usr/bin/nm -B  src/util/.libs/CommandOptions.o 
src/util/.libs/RefCounter.o src/util/.libs/Logger.o src/.libs/NgramLM.o 
src/.libs/Vocab.o src/.libs/PerplexityOptimizer.o src/.libs/Lattice.o 
src/.libs/Smoothing.o src/.libs/NgramModel.o src/.libs/NgramVector.o 
src/.libs/MaxLikelihoodSmoothing.o src/.libs/KneserNeySmoothing.o 
src/.libs/InterpolatedNgramLM.o src/optimize/.libs/lbfgs.o 
src/optimize/.libs/lbfgsb.o src/optimize/.libs/fortran_wrapper.o 
src/.libs/WordErrorRateOptimizer.o   | sed -n -e 's/^.*[     
]\([ABCDGIRSTW][ABCDGIRSTW]*\)[  ][  ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 \2 \2/p' 
| sed '/ __gnu_lto/d' | /bin/sed 's/.* //' | sort | uniq > .libs/libmitlm.exp
libtool: link: /bin/grep -E -e "mitlm" ".libs/libmitlm.exp" > 
".libs/libmitlm.expT"
libtool: link: mv -f ".libs/libmitlm.expT" ".libs/libmitlm.exp"
libtool: link: g++  -fPIC -DPIC -shared -nostdlib 
/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crti.o 
/usr/lib/gcc/i686-linux-gnu/4.6/crtbeginS.o  src/util/.libs/CommandOptions.o 
src/util/.libs/RefCounter.o src/util/.libs/Logger.o src/.libs/NgramLM.o 
src/.libs/Vocab.o src/.libs/PerplexityOptimizer.o src/.libs/Lattice.o 
src/.libs/Smoothing.o src/.libs/NgramModel.o src/.libs/NgramVector.o 
src/.libs/MaxLikelihoodSmoothing.o src/.libs/KneserNeySmoothing.o 
src/.libs/InterpolatedNgramLM.o src/optimize/.libs/lbfgs.o 
src/optimize/.libs/lbfgsb.o src/optimize/.libs/fortran_wrapper.o 
src/.libs/WordErrorRateOptimizer.o   -Wl,-rpath -Wl,/home/java/test/mitlm/.libs 
-L. -lgfortran /home/java/test/mitlm/.libs/libmitlm.so 
-L/usr/lib/gcc/i686-linux-gnu/4.6 
-L/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu 
-L/usr/lib/gcc/i686-linux-gnu/4.6/../../../../lib -L/lib/i386-linux-gnu 
-L/lib/../lib -L/usr/lib/i386-linux-gnu -L/usr/lib/../lib 
-L/usr/lib/gcc/i686-linux-gnu/4.6/../../.. -lstdc++ -lm -lc -lgcc_s 
/usr/lib/gcc/i686-linux-gnu/4.6/crtendS.o 
/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crtn.o    -Wl,-soname 
-Wl,libmitlm.so.0 -Wl,-retain-symbols-file -Wl,.libs/libmitlm.exp -o 
.libs/libmitlm.so.0.0.0
g++: error: /home/java/test/mitlm/.libs/libmitlm.so: No such file or directory
make[1]: *** [libmitlm.la] Error 1
make[1]: Leaving directory `/home/java/test/mitlm'
make: *** [all-recursive] Error 1

Original issue reported on code.google.com by [email protected] on 21 Feb 2015 at 7:18

Attachments:

Makefile

segfault for interpolate-ngram


Hi I'm trying to interpolate two fairly straightforward 3gram lms with the
interpolate-ngram tool.

The command I'm running is,
-------------------
$ interpolate-ngram -o 3 -l lm1.arpa,lm2.arpa -wl lm1lm2.arpa
Loading component LM lm1.arpa...
Loading component LM lm2.arpa...
Segmentation fault
-------------------

The first lm was created with the estimate-ngram tool from a fairly small
training text (apprx 70mb),

$ estimate-ngram -t lm1.txt -wl lm1.arpa -o 3

The second lm is the gigaword 64k NVP 3gram model from Keith Vertanen's
open source LM page,

http://www.keithv.com/software/giga/

My guess is that there is something about the KV model that
interpolate-ngram doesn't like, but it isn't terribly clear what that might be.

Also, neither of the vocabularies is a subset of the other (although I
don't know whether or not that is relevant).

Original issue reported on code.google.com by [email protected] on 28 Feb 2010 at 1:47

ZFile class uses popen with "rb"/"wb" which fails on some linux platforms

What steps will reproduce the problem?
1. compile gcc 4.x linux platform 
2. run with gzip files on linux

What is the expected output? What do you see instead?

ZFile._file is null because popen() fails

What version of the product are you using? On what operating system?

0.4 running on ubuntu 8.04

Please provide any additional information below.

the fix is to change mode "rb" and "wb" to "r" and "w".  popen() 
implementations do not always 
support "rb" or "wb" modes (see: 
http://opengroup.org/onlinepubs/007908775/xsh/popen.html)

Original issue reported on code.google.com by [email protected] on 13 May 2009 at 12:11

Error in FastIO:Copy (possibly similar to Issue 4)

I am getting an error very similar to the -i CM case in Issue 4, with
failure the Copy function (though I get it while copying the lm during
smoothing). This happens during the 6th pass through, and only with -o6
(not -o5) and only on a 300+meg text, so it seems like it might be a memory
issue of some sort.


Starting program: /data/homes/benp/lmlowdata/mitlm.0.4~/estimate-ngram -t
spec.txt -wc spec.counts -o 6
Loading corpus all-Podiatry-all-all.txt...
Smoothing[1] = ModKN
Smoothing[2] = ModKN
Smoothing[3] = ModKN
Smoothing[4] = ModKN
Smoothing[5] = ModKN
Smoothing[6] = ModKN
Set smoothing algorithms...

Program received signal SIGSEGV, Segmentation fault.
KneserNeySmoothing::Initialize (this=0x86d7fe8, pLM=0xbfffb9c0, order=6) at
src/util/FastIO.h:56
56         *begin = *input;
(gdb) bt
#0  KneserNeySmoothing::Initialize (this=0x86d7fe8, pLM=0xbfffb9c0,
order=6) at src/util/FastIO.h:56
#1  0x08074743 in NgramLM::SetSmoothingAlgs (this=0xbfffb9c0,
smoothings=@0xbfffb71c) at src/NgramLM.cpp:287
#2  0x0807747b in NgramLM::Initialize (this=0xbfffb9c0, vocab=0x0,
useUnknown=false, text=0xbfffd160 "all-Podiatry-all-all.txt", counts=0x0,
    smoothingDesc=0x80c8986 "ModKN", featureDesc=0x0) at src/NgramLM.cpp:225
#3  0x0804daa9 in main (argc=Cannot access memory at address 0x0
) at src/estimate-ngram.cpp:120

Original issue reported on code.google.com by [email protected] on 3 Mar 2009 at 11:03

Cannot load some third-party ARPA-format LMs

What steps will reproduce the problem?

1. Compile trunk (r48)
2. Download an arpa LM e.g. the LM in 
http://www.keithv.com/software/giga/lm_giga_64k_nvp_3gram.zip
3. Run evaluate-ngram which loads the model, e.g.

evaluate-ngram -lm lm_giga_64k_nvp_3gram.arpa -eval-perp data.txt

What is the expected output? What do you see instead?

0.001   Loading LM lm_giga_64k_nvp_3gram.arpa...
Assertion failed: (p >= &line[lineLen]), function LoadLM, file 
src/NgramModel.cpp, line 329.
Abort trap

What version of the product are you using? On what operating system?

r48, MacOSX

Darwin stephen-marquards-macbook-pro.local 10.6.0 Darwin Kernel Version 10.6.0: 
Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 9 Mar 2011 at 1:01

Cannot load gzipped ARPA files

MITLM tools cannot load gzipped ARPA LM files, even those produced by
estimate-ngram or interpolate-ngram.

This is what happens:

$ ~/lbin/mitlm-svn/evaluate-ngram --read-lm tmp.arpa.gz
--evaluate-perplexity dev.txt 
Loading LM tmp.arpa.gz...
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Unexpected file format.


Backtrace from gdb:
(gdb) bt
#0  0x00000035c102ee25 in raise () from /lib64/libc.so.6
#1  0x00000035c1030770 in abort () from /lib64/libc.so.6
#2  0x00000035c27c0f74 in __gnu_cxx::__verbose_terminate_handler () from
/usr/lib64/libstdc++.so.6
#3  0x00000035c27bf0b6 in std::set_unexpected () from /usr/lib64/libstdc++.so.6
#4  0x00000035c27bf0e3 in std::terminate () from /usr/lib64/libstdc++.so.6
#5  0x00000035c27bf1ca in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x00000000004181ac in NgramModel::LoadLM (this=0x5adff0,
probVectors=@0x7fff7925fc28, bowVectors=@0x7fff7925fc40,
lmFile=@0x7fff7925fe50)
    at src/NgramModel.cpp:289
#7  0x0000000000426d1a in ArpaNgramLM::LoadLM (this=0x7fff7925fc10,
lmFile=@0x7fff7925fe50) at src/NgramLM.cpp:141
#8  0x000000000046c38f in main (argc=5, argv=0x7fff79260118) at
src/evaluate-ngram.cpp:150


I'm using MITLM from SVN, Linux, amd64.
I attached the tmp.arpa.z file (produced with estimate-ngram)

Loading uncompressed ARPA files works fine.

Original issue reported on code.google.com by [email protected] on 9 Dec 2008 at 1:30

Attachments:

tmp.arpa.gz

f77 compiler not found

Hi, am attempting to compile and after install of gcc-gfortran version
4.4.1, I get error when compiling lbfgsb.f: 

make: f77: Command not found

Is there some way to tell Linux or GCC that the Fortran compiler is not f77? 

Thanks
Hal

Original issue reported on code.google.com by [email protected] on 5 Nov 2009 at 12:56

Autotools and Debian packaging

Hi,
 I've created configuration files for autotools and Debian packages. Are you interested?

1) Here are quick instructions to patch the revision 41 of the code (you have 
to download the attached patch file). Please note that I needed to apply a few 
fixes to the source in order to be able to compile mitlm on Debian squeeze:
svn checkout -r 41 http://mitlm.googlecode.com/svn/trunk/ mitlm-read-only
cd mitlm-read-only
patch -p0 < ../autotools_debian_compilation.diff
chmod 755 autogen.sh debian/rules
touch NEWS AUTHORS ChangeLog debian/info
svn add configure.ac autogen.sh Makefile.am NEWS AUTHORS ChangeLog debian
svn remove Makefile
svn move LICENSE COPYING

2) Once you have installed autoconf, automake and libtool you can use this 
command line to generate configure script and running it
./autogen.sh --prefix=$(pwd)/usr

3) After this you can run "make dist-gzip" to create mitlm-0.4.tar.gz archive 
for distribution (users of this package can simply run ./configure && make && 
make install), you can also run "dpkg-buildpackage -rfakeroot" to create Debian 
packages (it will create 4 packages, one for the binaries and three for the 
library) for your architecture.

4) To change the package version, just edit the configure.ac file, change it 
and re-run step 2 and 3.

Bests.

Original issue reported on code.google.com by [email protected] on 16 Nov 2010 at 9:11

Attachments:

autotools_debian_compilation.diff

Build issue on Red Hat Scientific Linux

What steps will reproduce the problem?
1. Check out trunk (r48)
2. ./autogen.sh
3. make

What is the expected output? What do you see instead?

Build fails with:

make[1]: *** No rule to make target `lbfgs.lo', needed by `libmitlm.la'.  Stop.
make[1]: Leaving directory `/home/stephen/mitlm2'
make: *** [all-recursive] Error 1

This can be fixed by editing the Makefile produced by autogen.sh:

--- Makefile.orig   2011-03-08 14:19:53.000000000 +0200
+++ Makefile    2011-03-08 14:20:26.000000000 +0200
@@ -75,7 +75,7 @@
    src/Vocab.lo src/PerplexityOptimizer.lo src/Lattice.lo \
    src/Smoothing.lo src/NgramModel.lo src/NgramVector.lo \
    src/MaxLikelihoodSmoothing.lo src/KneserNeySmoothing.lo \
-   src/InterpolatedNgramLM.lo lbfgs.lo lbfgsb.lo \
+   src/InterpolatedNgramLM.lo src/optimize/lbfgs.lo src/optimize/lbfgsb.lo \
    src/WordErrorRateOptimizer.lo
 libmitlm_la_OBJECTS = $(am_libmitlm_la_OBJECTS)
 binPROGRAMS_INSTALL = $(INSTALL_PROGRAM)

but I don't know what the underlying cause is.

What version of the product are you using? On what operating system?

trunk (r48)

Linux srvslngrd003.uct.ac.za 2.6.18-194.3.1.el5 #1 SMP Fri May 7 01:43:09 EDT 
2010 x86_64 x86_64 x86_64 GNU/Linux

Red Hat Scientific Linux SL release 5.4 (Boron)

Please provide any additional information below.

autoconf (GNU Autoconf) 2.59

Original issue reported on code.google.com by [email protected] on 8 Mar 2011 at 12:27

Attachments:

mitlm-redhat-build.tgz

Interpolation assertion `!anyTrue(isnan(bows))' failed.

Hi,

I am trying to build and interpolate very small language models (most higher 
order n-grams are unique). I am not able to interpolate ARPAs, because it 
always throw the following error:

$ estimate-ngram -order 3 -text 1.txt -s ML -wl 1.arpa
...
$ estimate-ngram -order 3 -text 2.txt -s ML -wl 2.arpa
...
$ interpolate-ngram -order 3 -lm "1.arpa, 2.arpa" -wl 3.arpa
...
interpolate-ngram: src/InterpolatedNgramLM.cpp:327: void 
mitlm::InterpolatedNgramLM::_EstimateBows(): Assertion `!anyTrue(isnan(bows))' 
failed.
Aborted (core dumped)


PS 1. The same when doing open vocab (-unk) 
PS 2. My minimalistic arpa do have most of BOW set to -99
PS 3. There are quite a lot of n-grams with -log(p) == 0.00000 in arpa
PS 4. I found out that "</s>" 1-gram _DOES_NOT_ have BOW in arpa
PS 5. I am using mitlm-0.4.1

Any ideas?

Original issue reported on code.google.com by [email protected] on 22 Jul 2015 at 8:03

Words specified in the vocabulary but not in training data do not appear in the language model

I'm not sure if this is a feature or a bug, but when using estimate-ngram
with -v option, the words specified in the vocabulary but that are not seen
in the training data do not appear in the resulting LM. It would be nice if
there was a way to apply some discounting to also estimate the unigram
probabilities of unseen words (i.e. like SRILM's ngram-count does).

I'm using MITLM from SVN.

Original issue reported on code.google.com by [email protected] on 12 Dec 2008 at 3:22

Compiling mitlm 0.4 on Ubuntu 9.10

Just in case anyone will come across the same problem:
The linker was complaining about -lg2c and an undefined reference to
ApplySort(...). Here are few steps leading to successful compilation:

1. install gfortran:
sudo apt-get install gfortran

2. in the Makefile of mitlm, it is needed to set:
LDFLAGS  = -L. -lgfortran -lmitlm 
FC       = gfortran

...the LDFLAGS line is already there, but it is needed to replace -lg2c
with -lgfortran

3. open the file NGramModel.cpp in a text editor and comment out the
definition of ApplySort template method. Then copy it's code and paste it
into NGramModel.h right after the declaration of ApplySort. The code of the
template method simply has to be in the header file.

4. rm src/NgramLM.o

5. make

Thank you Bo-June, for the toolkit.
Best regards,
Michal

Original issue reported on code.google.com by [email protected] on 14 Jan 2010 at 11:29

Compilation error related to the 'fortran_wrapper'

What steps will reproduce the problem?
1. Download the latest version 0.4.1 
2. Make -j
3.

What is the expected output? What do you see instead?
Compilation errors related to the 'fortran_wrapper'.


What version of the product are you using? On what operating system?
Version 0.4.1 on Mac OSX 10.9.2


Please provide any additional information below.

The errors:

src/optimize/fortran_wrapper.c:38:6: error: function cannot return function 
type 'void (int *, int *, double *, double *, double *, int *, double *,
      double *, double *, double *, double *, int *, char *, int *, char *, int *, int *, double *)'
void setulb_f77(int *n, int *m, double *x, double *l, double *u, int *nbd,
     ^
src/optimize/fortran_wrapper.c:36:29: note: expanded from macro 'setulb_f77'
#define setulb_f77 F77_FUNC (setulb, SETULB)
                            ^
src/optimize/fortran_wrapper.c:38:6: error: a parameter list without types is 
only allowed in a function definition
src/optimize/fortran_wrapper.c:36:30: note: expanded from macro 'setulb_f77'
#define setulb_f77 F77_FUNC (setulb, SETULB)
                             ^
src/optimize/fortran_wrapper.c:46:6: error: function cannot return function 
type 'void (int *, int *, double *, double *, double *, int *, double *, int
      *, double *, double *, double *, int *)'
void lbfgs_f77(int *n, int *m, double *x, double *f, double *g,
     ^
src/optimize/fortran_wrapper.c:44:28: note: expanded from macro 'lbfgs_f77'
#define lbfgs_f77 F77_FUNC (lbfgs, LBFGS)
                           ^
src/optimize/fortran_wrapper.c:46:6: error: a parameter list without types is 
only allowed in a function definition
src/optimize/fortran_wrapper.c:44:29: note: expanded from macro 'lbfgs_f77'
#define lbfgs_f77 F77_FUNC (lbfgs, LBFGS)
                            ^
src/optimize/fortran_wrapper.c:56:2: error: use of undeclared identifier 
'setulb'
        setulb_f77(n, m, x, l, u, nbd, f, g, factr, pgtol, wa,
        ^
src/optimize/fortran_wrapper.c:36:30: note: expanded from macro 'setulb_f77'
#define setulb_f77 F77_FUNC (setulb, SETULB)
                             ^
src/optimize/fortran_wrapper.c:64:2: error: use of undeclared identifier 'lbfgs'
        lbfgs_f77(n, m, x, f, g, diagco, diag, iprint, eps,
        ^
src/optimize/fortran_wrapper.c:44:29: note: expanded from macro 'lbfgs_f77'
#define lbfgs_f77 F77_FUNC (lbfgs, LBFGS)

Original issue reported on code.google.com by [email protected] on 10 Apr 2014 at 5:23

Interpolation with perplexity optimization fails with large models

I have two 4-gram models, 191567768 and 38095008 bytes in MITLM binary format. 

When I use interpolate-ngram to LI them without any perplexity
optimization, it works fine. However, when I add --optimize-perplexity
option, I get segmentation fault.

This is what happens:

$ ~/lbin/mitlm-svn/interpolate-ngram -l build/lm/tmp/model1.mitlm
model2.mitlm -o 4 --optimize-perplexity dev.txt -write-lm out.arpa.gz
Loading component LM model1.mitlm...
Loading component LM model2.mitlm...
Interpolating component LMs...
Interpolation Method = LI
Loading development set dev.txt...
Optimizing 1 parameters...
Segmentation fault (core dumped)

Backtrace from gdb:

(gdb) bt
#0  0x0000000000430c9c in InterpolatedNgramLM::_EstimateProbsMasked
(this=0x7fffa952a110, params=@0x7fffa952a210, pMask=0x5b4280) at
src/InterpolatedNgramLM.cpp:342
#1  0x00000000004316dd in InterpolatedNgramLM::Estimate
(this=0x7fffa952a110, params=@0x7fffa952a3a0, pMask=0x5b4280) at
src/InterpolatedNgramLM.cpp:214
#2  0x0000000000441f7a in PerplexityOptimizer::ComputeEntropy
(this=0x7fffa952a250, params=@0x7fffa952a3a0) at src/PerplexityOptimizer.cpp:61
#3  0x0000000000443381 in
PerplexityOptimizer::ComputeEntropyFunc::operator() (this=0x7fffa9529f90,
params=@0x7fffa952a3a0) at src/PerplexityOptimizer.h:64
#4  0x0000000000445076 in
MinimizeLBFGSB<PerplexityOptimizer::ComputeEntropyFunc>
(func=@0x7fffa9529f90, x=@0x7fffa952a3a0, numIter=@0x7fffa9529f8c,
step=1e-08, factr=10000000,
    pgtol=1.0000000000000001e-05, maxIter=15000) at src/optimize/LBFGSB.h:79
#5  0x0000000000442643 in PerplexityOptimizer::Optimize
(this=0x7fffa952a250, params=@0x7fffa952a3a0, technique=LBFGSBOptimization)
at src/PerplexityOptimizer.cpp:122
#6  0x000000000046db3e in main (argc=10, argv=0x7fffa952aab8) at
src/interpolate-ngram.cpp:277


A similar thing happens with -i CM, but it crasher earlier:

$ ~/lbin/mitlm-svn/interpolate-ngram -l build/lm/tmp/model1.mitlm
model2.mitlm -o 4 --optimize-perplexity dev.txt -write-lm out.arpa.gz -i CM
Interpolating component LMs...
Interpolation Method = CM
Loading counts for model1.mitlm from log:model1.counts...
Loading counts for model2.mitlm from log:model2.counts...
Loading development set dev.txt...
Segmentation fault (core dumped)


(gdb) bt
#0  0x0000000000429924 in Copy<unsigned char const*, unsigned char*>
(input=0xa0 <Address 0xa0 out of bounds>, begin=0x2aaad1028010 "",
end=0x2aaad15da9f0 "")
    at src/util/FastIO.h:56
#1  0x000000000042db71 in DenseVector<unsigned char>::operator=
(this=0x5b4810, v=@0x5b3ea0) at src/vector/DenseVector.tcc:146
#2  0x0000000000431974 in InterpolatedNgramLM::GetMask
(this=0x7fff0e36af40, probMaskVectors=@0x7fff0e36ad30,
bowMaskVectors=@0x7fff0e36ad10) at src/InterpolatedNgramLM.cpp:153
#3  0x0000000000442c6e in PerplexityOptimizer::LoadCorpus
(this=0x7fff0e36b080, corpusFile=@0x7fff0e36b2f0) at
src/PerplexityOptimizer.cpp:55
#4  0x000000000046db01 in main (argc=12, argv=0x7fff0e36b8e8) at
src/interpolate-ngram.cpp:274


BTW, the same thing works with small toy models.

I'm using MTLM from SVN, Linux, amd64.

Original issue reported on code.google.com by [email protected] on 10 Dec 2008 at 1:41

Release 0.4.1Steps to build MITLM trunk on OS X Yosemite

What steps will reproduce the problem?
1. Download latest 0.4.1 tarsal
2. Experience failure to build

Solution:
Add #include <string> to FastIO.h, after #include <cstring>

Recipe to fix:

Use Xcode toolchain, not homebrew GCC
Download gfortran 4.9 from https://gcc.gnu.org/wiki/GFortranBinaries#MacOS
Check out latest source svn checkout http://mitlm.googlecode.com/svn/trunk/ 
mitlm-read-only
Add #include <string> to FastIO.h
autogen.sh
make
make install

Original issue reported on code.google.com by [email protected] on 24 Nov 2014 at 1:13

interpolate-ngram: -unk with -lm is not implemented yet.

I have a background unigram model (bg.arpa), some additional training data
(train.txt) and some dev text (dev.txt). I want to create an interpolated
unigram that optimizes the perplexity of dev.txt. I also need open
vocabulary LM (-unk).

I execute:
$ interpolate-ngram -l bg.arpa -t train.txt -op dev.txt -o 1 -wf
"entropy:train.txt" -unk 1 -v etc/vocab

I get:
...
Loading component LM bg.arpa...
-unk with -lm is not implemented yet.
-- RefCounter----------
map[0x2aaaab5a5010] = 0
map[0x2aaaab0dc010] = 1
map[0x5a9c60] = 0
map[0x5a94f0] = 0
map[0x2aaaab1dd010] = 1
map[0x2aaaab018010] = 1
map[0x5a9f00] = 0
map[0x5a9cf0] = 0
-----------------------

Without -unk it seems to work fine.
OK, I understand it's not implemented, but maybe it's just a simple fix...
Otherwise, I think I know a workaround. Thanks.

Original issue reported on code.google.com by [email protected] on 26 Feb 2009 at 4:42

Crash where suffix of an ngram not present in count file

What steps will reproduce the problem?

Create a large counts file in which there an ngram (e.g. "foo bar baz") whose 
suffix ngram ("bar baz") doesn't exist earlier in the file.

Run `estimate-ngram -wl lm.arpa -counts counts` on it.

Note this doesn't always happen consistently for me with smaller count files, 
but seems to replicate fairly consistently with larger (or at least 
middle-sized) files.


What is the expected output? What do you see instead?

I'd ideally expect it allow a language model to be built in this case, even if 
it means removing/skipping over the ngram in question, or making some
assumption about the count for the missing suffix (e.g. same as the  
higher-order ngram).

I realise that these missing suffixes won't occur if I use MITLM itself to 
compute the counts from a corpus, however if dealing with large amounts of 
count-based source data from some other tools/sources, it's possible for these 
kinds of constraints to be violated accidentally due to data corruption or bugs 
beyond your control, and so it would be convenient if MITLM could cope 
gracefully with these cases.

Alternatively if this is a WONTFIX then it would be good to at least document 
what the constraint is on acceptable input for counts files, and give a more 
friendly error message if the constraint is violated, so people know how to fix 
up their input files in order to get MITLM to work.

Currently what you see is:

estimate-ngram: src/NgramModel.cpp:811: void 
mitlm::NgramModel::_ComputeBackoffs(): Assertion `allTrue(backoffs != 
NgramVector::Invalid)' failed.
Aborted (core dumped)


What version of the product are you using? On what operating system?

Built from latest github master, Ubuntu 14.04.1


Cheers!

Original issue reported on code.google.com by [email protected] on 11 Feb 2015 at 12:08

Linear interpolation of LMs with --optimize-perplexity crashes

When I run smth like:

interpolate-ngram -l lm1.mitlm lm2.mitlm --write-lm tmp3.arpa.gz
--optimize-perplexity dev.txt

Loading component LM lm1.mitll...
Loading component LM lm2.mitlm...
Interpolating component LMs...
Interpolation Method = LI
Loading development set dev.txt...
Segmentation fault (core dumped)


gdb shows:

(gdb) bt
#0  0x0000000000447d9f in PerplexityOptimizer::LoadCorpus
(this=0x7fffd28165d0, corpusFile=Variable "corpusFile" is not available.
) at src/util/FastIO.h:54
#1  0x0000000000479ee5 in main (argc=8, argv=0x7fffd2816de8) at
src/interpolate-ngram.cpp:270

I'm using mitlm from SVN under Linux, amd64.

Original issue reported on code.google.com by [email protected] on 3 Dec 2008 at 2:20

mitlm Fedora 12 compilation errpr

What steps will reproduce the problem?

1. Doenload mitlm.0.4. and extract it
2. goto the mitlm.0.4 dir
3. run make -j 

What is the expected output? What do you see instead?

Succesful compilation of the tool
What version of the product are you using? On what operating system?
mitlm0.4 
Os - fedora12

Please provide any additional information below.
error
src/vector/VectorOps.h:168: error: expected primary-expression before ‘>’ 
token
src/vector/VectorOps.h:168: error: no matching function for call to ‘min()’
In file included from src/NgramModel.h:42,
                 from src/NgramModel.cpp:43:
src/Vocab.h: In member function ‘VocabIndex Vocab::Find(const char*) const’:
src/Vocab.h:78: error: ‘strlen’ was not declared in this scope
src/Vocab.h: In member function ‘VocabIndex Vocab::Add(const char*)’:
src/Vocab.h:80: error: ‘strlen’ was not declared in this scope
src/NgramModel.cpp: In member function ‘void
NgramModel::LoadCorpus(std::vector<DenseVector<int>,
std::allocator<DenseVector<int> > >&, ZFile&, bool)’:
src/NgramModel.cpp:93: error: ‘strncmp’ was not declared in this scope
src/NgramModel.cpp:93: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp: In member function ‘void
NgramModel::LoadLM(std::vector<DenseVector<double>,
std::allocator<DenseVector<double> > >&, std::vector<DenseVector<double>,
std::allocator<DenseVector<double> > >&, ZFile&)’:
src/NgramModel.cpp:264: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp:297: error: ‘strlen’ was not declared in this scope
src/NgramModel.cpp:323: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp:344: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp: In member function ‘void
NgramModel::LoadEvalCorpus(std::vector<DenseVector<int>,
std::allocator<DenseVector<int> > >&, std::vector<DenseVector<int>,
std::allocator<DenseVector<int> > >&, BitVector&, ZFile&, size_t&, size_t&)
const’:
src/NgramModel.cpp:478: error: ‘strncmp’ was not declared in this scope
src/NgramModel.cpp:478: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp: In member function ‘void
NgramModel::LoadComputedFeatures(std::vector<DenseVector<double>,
std::allocator<DenseVector<double> > >&, const char*, size_t) const’:
src/NgramModel.cpp:586: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp:611: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp: In member function ‘void
NgramModel::_LoadFrequency(std::vector<DenseVector<double>,
std::allocator<DenseVector<double> > >&, ZFile&, size_t) const’:
src/NgramModel.cpp:858: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp:870: error: ‘strncmp’ was not declared in this scope
src/NgramModel.cpp: In member function ‘void
NgramModel::_LoadEntropy(std::vector<DenseVector<double>,
std::allocator<DenseVector<double> > >&, ZFile&, size_t) const’:
src/NgramModel.cpp:937: error: ‘strcmp’ was not declared in this scope
src/NgramModel.cpp:951: error: ‘strncmp’ was not declared in this scope
make: *** [src/NgramModel.o] Error 1
In file included from
/usr/lib/gcc/i686-redhat-linux/4.4.2/../../../../include/c++/4.4.2/ext/hash_map:
59,
                 from src/util/RefCounter.h:38,
                 from src/util/SharedPtr.h:38,
                 from src/NgramLM.h:39,
                 from src/KneserNeySmoothing.cpp:37:
/usr/lib/gcc/i686-redhat-linux/4.4.2/../../../../include/c++/4.4.2/backward/back
ward_warning.h:28:2:
warning: #warning This file includes at least one deprecated or antiquated
header which may be removed without further notice at a future date. Please
use a non-deprecated interface with equivalent functionality instead. For a
listing of replacement headers and interfaces, consult the file
backward_warning.h. To disable this warning use -Wno-deprecated.
In file included from src/vector/DenseVector.tcc:40,
                 from src/vector/DenseVector.h:144,
                 from src/Types.h:40,
                 from src/NgramLM.h:41,
                 from src/KneserNeySmoothing.cpp:38:
src/util/FastIO.h: In function ‘bool getline(FILE*, char*, size_t)’:
src/util/FastIO.h:111: error: ‘strlen’ was not declared in this scope
src/util/FastIO.h: In function ‘bool getline(FILE*, char*, size_t, 
size_t*)’:
src/util/FastIO.h:123: error: ‘strlen’ was not declared in this scope
src/util/FastIO.h: In function ‘void WriteHeader(FILE*, const char*)’:
src/util/FastIO.h:184: error: ‘strlen’ was not declared in this scope
src/util/FastIO.h: In function ‘void VerifyHeader(FILE*, const char*)’:
src/util/FastIO.h:237: error: ‘strlen’ was not declared in this scope
src/util/FastIO.h:239: error: ‘strncmp’ was not declared in this scope
In file included from src/Types.h:42,
                 from src/NgramLM.h:41,
                 from src/KneserNeySmoothing.cpp:38:
src/vector/VectorOps.h: In function ‘typename V::ElementType min(const
Vector<I>&)’:
src/vector/VectorOps.h:159: error: ‘numeric_limits’ is not a member of 
‘std’
src/vector/VectorOps.h:159: error: expected primary-expression before ‘>’ 
token
src/vector/VectorOps.h:159: error: ‘::max’ has not been declared
src/vector/VectorOps.h: In function ‘typename V::ElementType max(const
Vector<I>&)’:
src/vector/VectorOps.h:168: error: ‘numeric_limits’ is not a member of 
‘std’
src/vector/VectorOps.h:168: error: expected primary-expression before ‘>’ 
token
src/vector/VectorOps.h:168: error: no matching function for call to ‘min()’
In file included from src/Vocab.h:41,
                 from src/NgramLM.h:42,
                 from src/KneserNeySmoothing.cpp:38:
src/util/ZFile.h: In member function ‘bool ZFile::endsWith(const char*,
const char*)’:
src/util/ZFile.h:51: error: ‘strlen’ was not declared in this scope
src/util/ZFile.h:54: error: ‘strncmp’ was not declared in this scope
In file included from src/NgramLM.h:42,
                 from src/KneserNeySmoothing.cpp:38:
src/Vocab.h: In member function ‘VocabIndex Vocab::Find(const char*) const’:
src/Vocab.h:78: error: ‘strlen’ was not declared in this scope
src/Vocab.h: In member function ‘VocabIndex Vocab::Add(const char*)’:
src/Vocab.h:80: error: ‘strlen’ was not declared in this scope
In file included from src/vector/DenseVector.h:144,
                 from src/Types.h:40,
                 from src/NgramLM.h:41,
                 from src/KneserNeySmoothing.cpp:38:
src/vector/DenseVector.tcc: In member function ‘void DenseVector<T>::set(T)
[with T = double]’:
src/KneserNeySmoothing.cpp:163:   instantiated from here
src/vector/DenseVector.tcc:366: error: ‘memset’ was not declared in this 
scope
make: *** [src/KneserNeySmoothing.o] Error 1
In file included from
/usr/lib/gcc/i686-redhat-linux/4.4.2/../../../../include/c++/4.4.2/ext/hash_map:
59,
                 from src/util/RefCounter.h:38,
                 from src/vector/DenseVector.tcc:37,
                 from src/vector/DenseVector.h:143,
                 from src/Types.h:39,
                 from src/Lattice.h:41,
                 from src/Lattice.cpp:41:
/usr/lib/gcc/i686-redhat-linux/4.4.2/../../../../include/c++/4.4.2/backward/back
ward_warning.h:28:2:
warning: #warning This file includes at least one deprecated or antiquated
header which may be removed without further notice at a future date. Please
use a non-deprecated interface with equivalent functionality instead. For a
listing of replacement headers and interfaces, consult the file
backward_warning.h. To disable this warning use -Wno-deprecated.
In file included from src/Lattice.cpp:41:
src/util/FastIO.h: In function ‘bool getline(FILE*, char*, size_t)’:
src/util/FastIO.h:111: error: ‘strlen’ was not declared in this scope
src/util/FastIO.h: In function ‘bool getline(FILE*, char*, size_t, 
size_t*)’:
src/util/FastIO.h:123: error: ‘strlen’ was not declared in this scope
src/util/FastIO.h: In function ‘void WriteHeader(FILE*, const char*)’:
src/util/FastIO.h:184: error: ‘strlen’ was not declared in this scope
src/util/FastIO.h: In function ‘void VerifyHeader(FILE*, const char*)’:
src/util/FastIO.h:237: error: ‘strlen’ was not declared in this scope
src/util/FastIO.h:239: error: ‘strncmp’ was not declared in this scope
In file included from src/Lattice.h:41,
                 from src/Lattice.cpp:42:
src/util/ZFile.h: In member function ‘bool ZFile::endsWith(const char*,
const char*)’:
src/util/ZFile.h:51: error: ‘strlen’ was not declared in this scope
src/util/ZFile.h:54: error: ‘strncmp’ was not declared in this scope
In file included from src/NgramLM.h:42,
                 from src/Lattice.h:43,
                 from src/Lattice.cpp:42:
src/Vocab.h: In member function ‘VocabIndex Vocab::Find(const char*) const’:
src/Vocab.h:78: error: ‘strlen’ was not declared in this scope
src/Vocab.h: In member function ‘VocabIndex Vocab::Add(const char*)’:
src/Vocab.h:80: error: ‘strlen’ was not declared in this scope
src/Lattice.cpp: In member function ‘void Lattice::LoadLattice(ZFile&)’:
src/Lattice.cpp:109: error: ‘strcmp’ was not declared in this scope
src/Lattice.cpp:112: error: ‘strcmp’ was not declared in this scope
make: *** [src/Lattice.o] Error 1

Original issue reported on code.google.com by [email protected] on 17 Jan 2010 at 12:26

Interpolation is broken


After updating from SVN, both LI and CM interpolation seem to be broken: in
the interpolated LM, there are many "nans" and most back-off weights are zero.

Sample interpolated LM:

\data\
ngram 1=199992
ngram 2=865062
ngram 3=2657490
ngram 4=4259246

\1-grams:
-1.564484       </s>
-99     <s>     nan

[...]

-5.898262       Abadan
-6.074985       Abadi
-6.242569       Abadia  0.000000
-6.105848       Abadie
-6.242569       Abadou  0.000000

[...]

\2-grams:
nan     </s> -t-il      -0.019559
-2.477506       </s> <UNK>      -0.299104
nan     </s> A  -0.020696
nan     </s> A.
nan     </s> A.B.
nan     </s> A.K.
nan     </s> ABM
nan     </s> ACF
nan     </s> AFP        -0.045175


The source LMs (estimated with estimate-ngram) seem to be OK.

Original issue reported on code.google.com by [email protected] on 16 Dec 2008 at 10:24

estimate n-gram cannot find libmitlm.so.0

What steps will reproduce the problem?
1. estimate-ngram -h

What is the expected output? What do you see instead?
expected: help message
actual: estimate-ngram: error while loading shared libraries: libmitlm.so.0: 
cannot open shared object file: No such file or directory

What version of the product are you using? On what operating system?
using version 48 on ubuntu 11.10

Please provide any additional information below.
After searching I found that libmitlm.so.0 is located at 
/usr/local/lib/libmitlm.so.0

Original issue reported on code.google.com by [email protected] on 4 Jun 2012 at 4:59

libtool: compile: unrecognized option `-c'

It seems that the make is trying to use a libtool option that does not exist. 
What can I do to fix it?

What steps will reproduce the problem?
1. type: ./autogen.sh --prefix=$(pwd)/usr
2. type: make

What is the expected output? What do you see instead?
This should successfully make the program.

What version of the product are you using? On what operating system?
Revision 48

Please provide any additional information below.

Here is the exact bug in the make process:
/bin/bash ./libtool --tag=F77   --mode=compile    -c -o src/optimize/lbfgs.lo 
src/optimize/lbfgs.f
libtool: compile: unrecognized option `-c'
libtool: compile: Try `libtool --help' for more information.
make[1]: *** [src/optimize/lbfgs.lo] Error 1
make[1]: Leaving directory `/home/myname/workspace/sphinx/mitlm/mitlm-read-only'
make: *** [all-recursive] Error 1

Original issue reported on code.google.com by [email protected] on 13 Jul 2011 at 8:44

MCSRCH Error

Upon running interpolate-ngram twice, I received the same error twice after
the 46th iteration. When I did this with 1 less component lm, it went
through 188 iterations without error. 


interpolate-ngram -i GLI -op dev.txt -wl GLI.lm  -if
"log:sumhist:%s.effcounts" -o 6  -l "1.lm, 2.lm, 3.lm, 4.lm, 5.lm"

Optimizing 9 parameters...

 IFLAG= -1
 LINE SEARCH FAILED. SEE DOCUMENTATION OF ROUTINE MCSRCH
 ERROR RETURN OF LINE SEARCH: INFO=  3
 POSSIBLE CAUSES: FUNCTION OR GRADIENT ARE INCORRECT
 OR INCORRECT TOLERANCES
Iterations    = 46
Elapsed Time  = 516.460000
Perplexity    = 28.741343

Original issue reported on code.google.com by [email protected] on 27 Feb 2009 at 10:56

Installation issues. estimate-ngram: command not found

1. I decompressed the mitlm-0.4.1.tar.gz file on OSX Yosemite

From Terminal:
2.  ./compile.
3.  make -j

Error:
Input is: estimate-ngram -text mysent.txt -write-lm  mysent.lm
But then I get: estimate-ngram: command not found

Original issue reported on code.google.com by [email protected] on 19 Jun 2015 at 9:31

Fail to install mitln.0.4 on Cygwin

Dear all,

I've tried to install mitlm.0.4 to test the Phonetisaurus, but it couldn't make 
it because there were some problems with the fortran codes and makefile. 
(missing some library).

I've spent so long time for this, but still impossible to find the errors.
Please help me to get out from this problem....!

What version of the product are you using? On what operating system?
  - Cygwin with gcc 4.3.4 (20090804)
  - Window 7
  - mitln.0.4

For additional information, please refer to the attached file (for the error 
message).

.................................
gfortran -g -fPIC -fmessage-length=0  -O3 -DNDEBUG -funroll-loops  -c -o 
src/optimize/lbfgsb.o src/optimize/lbfgsb.f
f951: warning: -fPIC ignored for target (all code is position independent)
gfortran -g -fPIC -fmessage-length=0  -O3 -DNDEBUG -funroll-loops  -c -o 
src/optimize/lbfgs.o src/optimize/lbfgs.f
f951: warning: -fPIC ignored for target (all code is position independent)
ar rcs libmitlm.a src/util/RefCounter.o src/util/Logger.o 
src/util/CommandOptions.o src/Vocab.o src/NgramVector.o src/NgramModel.o 
src/NgramLM.o src/InterpolatedNgramLM.o src/Smoothing.o 
src/MaxLikelihoodSmoothing.o src/KneserNeySmoothing.o src/PerplexityOptimizer.o 
src/WordErrorRateOptimizer.o src/Lattice.o src/optimize/lbfgsb.o 
src/optimize/lbfgs.o
g++ -g -Wall -fPIC -fmessage-length=0 -Isrc -O3 -DNDEBUG -funroll-loops   -c -o 
src/estimate-ngram.o src/estimate-ngram.cpp
src/estimate-ngram.cpp:1: warning: -fPIC ignored for target (all code is 
position independent)
In file included from 
/usr/lib/gcc/i686-pc-cygwin/4.3.4/include/c++/ext/hash_map:64,
                 from src/util/CommandOptions.h:37,
                 from src/estimate-ngram.cpp:36:
/usr/lib/gcc/i686-pc-cygwin/4.3.4/include/c++/backward/backward_warning.h:33:2: 
warning: #warning This file includes at least one deprecated or antiquated 
header which may be removed without further notice at a future date. Please use 
a non-deprecated interface with equivalent functionality instead. For a listing 
of replacement headers and interfaces, consult the file backward_warning.h. To 
disable this warning use -Wno-deprecated.
g++ src/estimate-ngram.o -o estimate-ngram -L. -lgfortran -lmitlm  -O3 
-funroll-loops 
./libmitlm.a(lbfgs.o): In function `mcsrch':
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:670: undefined 
reference to `__gfortran_st_write'
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:670: undefined 
reference to `__gfortran_st_write_done'
./libmitlm.a(lbfgs.o): In function `lb1':
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:469: undefined 
reference to `__gfortran_st_write'
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:469: undefined 
reference to `__gfortran_transfer_integer'
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:469: undefined 
reference to `__gfortran_transfer_integer'
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:469: undefined 
reference to `__gfortran_transfer_real'
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:469: undefined 
reference to `__gfortran_transfer_real'
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:469: undefined 
reference to `__gfortran_transfer_real'
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:469: undefined 
reference to `__gfortran_st_write_done'

./libmitlm.a(lbfgs.o):/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:48
0: more undefined references to `__gfortran_transfer_real' follow
./libmitlm.a(lbfgs.o): In function `lb1':
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:480: undefined 
reference to `__gfortran_st_write_done'
./libmitlm.a(lbfgs.o): In function `lbfgs':
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:246: undefined 
reference to `__gfortran_st_write'
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgs.f:246: undefined 
reference to `__gfortran_st_write_done'
.................
./libmitlm.a(lbfgsb.o): In function `setulb':
/home/seng/Phonetisaurus/mitlm.0.4/src/optimize/lbfgsb.f:196: undefined 
reference to `__gfortran_compare_string'
collect2: ld returned 1 exit status
make: *** [estimate-ngram] Error 1


Thank you in advance for your help !

Seng,

Original issue reported on code.google.com by [email protected] on 15 Feb 2013 at 12:25

Attachments:

errFile5.txt

estimate-lm segfaults when specifying a vocab file

What steps will reproduce the problem?

Run estimate-lm -v vocab.file -t training.txt -wl lm.arpa. Where vocab.file 
contains a subset of the vocab in the LM training text.


What is the expected output? What do you see instead?

Program output:
0.000   Loading vocab vocab...
0.010   Loading corpus training.txt...
Segmentation fault


What version of the product are you using? On what operating system?

Latest SVN, ubuntu 10.04 64bit

Original issue reported on code.google.com by [email protected] on 14 Jul 2011 at 4:34

Interpolation with CM and GLI fails with using -opt-perp

When I use interpolate-ngram to interpolate two models by CM or GLI with 
perplexity optimization, I get following faults:

1st:
interpolate-ngram -lm "model1.lm, model2.lm" -smoothing ModKN -interpolation CM 
-opt-perp dev-set.txt -write-lm CM-model.lm
Loading component LM model1.lm...
Loading component LM model2.lm...
Interpolating component LMs...
Tying parameters across n-gram order...
Interpolation Method = CM
Loading feature for model1.lm from log:sumhist:model1.effcounts...
terminate called after throwing an instance of 'std::runtime_error'
 what(): Cannot open file
Aborted

2nd:
interpolate-ngram -lm "model1.lm, model2.lm" -smoothing ModKN -interpolation 
GLI -opt-perp dev-set.txt -write-lm GLI-model.lm
Loading component LM model1.lm...
Loading component LM model2.lm...
Interpolating component LMs...
Tying parameters across n-gram order...
Interpolation Method = GLI
Segmentation fault

I'm using MITLM v0.4 from SVN under Linux, Intel i7.

Jan

Original issue reported on code.google.com by [email protected] on 8 Sep 2010 at 7:11

Tokens beginning with # cause a crash when using count files

The crash only happens if the ngram order is higher than 1, and only if the # 
occurs at the start of a token.

I'm guessing this is because it interprets a # at the beginning of a line in a 
text counts file as a comment and skips it, meaning a unigram beginning with a 
# is missing from the term dictionary when it's encountered in a later bigram.


What steps will reproduce the problem?

$ estimate-ngram -wc counts -text <(echo 'a #hashtag')
0.001   Loading corpus /dev/fd/63...
0.002   Smoothing[1] = ModKN
0.002   Smoothing[2] = ModKN
0.002   Smoothing[3] = ModKN
0.002   Set smoothing algorithms...
0.002   Saving counts to counts...

$ cat counts
<s>     1
a       1
#hashtag        1
<s> a   1
a #hashtag      1
#hashtag </s>   1
<s> a #hashtag  1
a #hashtag </s> 1

$ estimate-ngram -counts counts -wl lm.arpa
0.001   Loading counts counts...
estimate-ngram: src/NgramModel.cpp:800: void 
mitlm::NgramModel::_ComputeBackoffs(): Assertion `allTrue(backoffs != 
NgramVector::Invalid)' failed.
Aborted (core dumped)



What version of the product are you using? On what operating system?

Built from latest master on github. Ubuntu 14.04.1

Original issue reported on code.google.com by [email protected] on 10 Feb 2015 at 6:39

kimiamania / mitlm Goto Github PK

mitlm's People

Contributors

Watchers

mitlm's Issues

Recommend Projects

Recommend Topics

Recommend Org