I am using morfessor with the word count genereted from Wikipedia. I noticed that the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Morfessor Models Sizes about morfessor HOT 8 CLOSED

aalto-speech commented on July 23, 2024

Morfessor Models Sizes

from morfessor.

Comments (8)

svirpioj commented on July 23, 2024

This is typical behavior for Morfessor models: the larger the training data, the larger the morph lexicon and the longer the average morph length.

There are several options to reduce the model size:

Train the model with word types (-d ones), if you already do not.
Set the corpus weight parameter below one (e.g. -w 0.1).
Discard low-frequency words from training (e.g. --batch-minfreq 2). Useful especially if the data is noisy (likely not the case with Wikipedia data).

More details and discussion can be found, for example, in this article: http://dspace.utlib.ee/dspace/handle/10062/17313

If your concern is just the size of the model file, you can try saving it in gzipped Morfessor 1.0 format. Slower to load and doesn't store any training parameters, but should be smaller.

from morfessor.

psmit commented on July 23, 2024

In the next version (which will be released in the coming months), there is an option for storing a reduced model; a model that can only be used for segmenting data.

from morfessor.

aboSamoor commented on July 23, 2024

Is there any progress on the issue of reducing the size of the trained models?

from morfessor.

psmit commented on July 23, 2024

Yes, we have implemented reduced models, and we have been using internally for a long time. The release of Morfessor 2.1 should come someday soon, but until then you can already use this branch: https://github.com/phsmit/morfessor/tree/develop

On the command line there is the --save-reduced option, in the code it is model.make_segment_only()

from morfessor.

aboSamoor commented on July 23, 2024

It indeed reduces the size of the models. It seems the option is already available on the pypi package (Morfessor 2.0.2alpha1), is it necessary to use this development branch?

Once I train a model, can I use the pypi version to actually segment, or I still need the development branch to segment text.

I am developing a package that will use morfessor as the backend for text segmentation and I would like to use the pypi package to manage my dependencies.

from morfessor.

psmit commented on July 23, 2024

Ah, indeed. No need to use the development branch. The models between the develop and alpha branch should be interchangable, but I can't guarantee it. We are thinking of more persistent models, but is not easy...

from morfessor.

bhashi12 commented on July 23, 2024

i've just downloaded Morfessor-2.0.2a4 in Ubuntu. I couldnot load Morfessor 1.0 style text model, its throwing error of" no such directory exist". Where could i find this file.

from morfessor.

psmit commented on July 23, 2024

@bhashi12 Sorry, I had not seen this question before. If it still persists, would you open a new issue?

from morfessor.

Morfessor Models Sizes about morfessor HOT 8 CLOSED

Comments (8)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent