<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Trying to build a language model on higher-order n-grams. about berkeleylm HOT 3 CLOSED

adampauls commented on September 28, 2024

Trying to build a language model on higher-order n-grams.

from berkeleylm.

Comments (3)

GoogleCodeExporter commented on September 28, 2024

Wanted to add that the NullPointerException persists even when I use the 
original vocab_cs.gz file (instead of the dummy empty file that I initially 
tried).

Original comment by [email protected] on 21 Nov 2014 at 4:25

from berkeleylm.

GoogleCodeExporter commented on September 28, 2024

Okay, so I have tried debugging for a few hours now, but no success yet. Here's 
a toy data I had created for my debugging efforts. Sharing it, in case it 
helps. As far as I can see, it stays true to the Google n-gram format, but 
after adding n-grams, the same NullPointerException is thrown:

120 missing suffixes or prefixes were found, doing another pass to add n-grams {
Exception in thread "main" java.lang.NullPointerException
    at edu.berkeley.nlp.lm.io.LmReaders.buildMapCommon(LmReaders.java:473)
    at edu.berkeley.nlp.lm.io.LmReaders.secondPassGoogle(LmReaders.java:417)
    at edu.berkeley.nlp.lm.io.LmReaders.readLmFromGoogleNgramDir(LmReaders.java:228)
    at edu.berkeley.nlp.lm.io.LmReaders.readLmFromGoogleNgramDir(LmReaders.java:204)

Original comment by [email protected] on 21 Nov 2014 at 7:57

Attachments:

testdata-ngrams.tar.gz

from berkeleylm.

GoogleCodeExporter commented on September 28, 2024

I don't intend to support this use case. The code assumes that lower order 
n-grams are available for each higher order n-gram. If you manage to get this 
working yourself, let me know and I'd be happy to patch things in!

Original comment by [email protected] on 6 Dec 2014 at 11:51