<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Original comment by adpa...@gmail.com on 9 Aug 2012 a

-mx1000m not appropriate for Google n-grams about berkeleylm HOT 4 CLOSED

adampauls commented on September 26, 2024

-mx1000m not appropriate for Google n-grams

from berkeleylm.

Comments (4)

GoogleCodeExporter commented on September 26, 2024

Sorry, I don't know how I missed this bug report for so long! Not sure what 
happened. 

Are you actually talking about running on the full google n-grams corpus? If 
so, then you need substantial amounts of memory, much more than the 10GB needed 
to store the n-grams once the binary is built. I haven't actually figured out 
what the minimum necessary is, but I would think you need at list 50GB of 
memory, which is available on large EC2 instances. 

However, I have pre-built binaries of these already compiled for you, so you 
can just download those (instructions are on the web page).

Original comment by [email protected] on 19 Feb 2012 at 5:48

from berkeleylm.

GoogleCodeExporter commented on September 26, 2024

Original comment by [email protected] on 9 Aug 2012 at 5:30

Changed state: WontFix

from berkeleylm.

GoogleCodeExporter commented on September 26, 2024

How long does it take to build the LM on the full n-grams corpus?

Original comment by [email protected] on 19 Aug 2012 at 9:33

from berkeleylm.

GoogleCodeExporter commented on September 26, 2024

It takes I think something on the order of 24 hours, maybe a little less. It's 
not something I've optimized heavily, so sorry about that. Let me know if you 
have any trouble building yourself (other than time and memory issues . . . )

Original comment by [email protected] on 20 Aug 2012 at 6:51

from berkeleylm.

Recommend Projects