<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Original comment by adpa...@gmail.com on 20 Nov 2011

How to train on Google n-grams about berkeleylm HOT 3 CLOSED

adampauls commented on September 28, 2024

How to train on Google n-grams

from berkeleylm.

Comments (3)

GoogleCodeExporter commented on September 28, 2024

Hi, Joseph, thanks for your question.

The format should be as in the example directory: 
test/edu/berkeley/nlp/lm/io/googledir

Specifically, the directory look like

# 1gms/vocab_cs.gz [here, vocab_cs.gz should have the unigram frequencies 
sorted in decreasing order of frequency]
# 2gms/2gm-0001.gz 2gm-0002.gz …
# 3gms/3gm-0001.gz 3gm-0002.gz … 
# ...

Given your directory structure, you will need create an [n]gms directory for 
n=1..5, and then copy/soft-link all files for each order to the corresponding 
[n]gms directory. You might also need to create the vocab_cs.gz by sorting the 
unigram file, though this comes with at least the English distribution (in 
1gms). 

I have added additional documentation about this to the example script for the 
next release.

Original comment by [email protected] on 20 Nov 2011 at 5:00

from berkeleylm.

GoogleCodeExporter commented on September 28, 2024

Original comment by [email protected] on 20 Nov 2011 at 5:00

Changed state: Fixed
Added labels: Type-Other
Removed labels: Type-Defect

from berkeleylm.

GoogleCodeExporter commented on September 28, 2024

Thanks, that worked great.

Original comment by [email protected] on 24 Nov 2011 at 2:56

from berkeleylm.

How to train on Google n-grams about berkeleylm HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent