Training language model results in traceback: <div class="snippet-clipboard-conten

Nope, it is definitely a bug. If I comment out these lines: <div class="snippe

Error training language model about texta HOT 5 CLOSED

andopaju commented on July 20, 2024

Error training language model

from texta.

Comments (5)

rsirel commented on July 20, 2024

How many documents do you have in your dataset? This is a symptom of having too few documents to build the Word2Vec model.

from texta.

andopaju commented on July 20, 2024

It's true that the error came up while testing with one document only.
But the cause seems to be that first ES_SCROLL_SIZE rows are discarded altogether and not taken into account while training.

from texta.

andopaju commented on July 20, 2024

Can't reproduce anymore, most likely false alert

from texta.

andopaju commented on July 20, 2024

Nope, it is definitely a bug.
If I comment out these lines:

#response = self.es_m.scroll(scroll_id=scroll_id)
#scroll_id = response['_scroll_id']

and then put

total_hits = 0

then training is completed successfully.

first response (before while loop):
{"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAYoFklGcnVXVE8wUlBTb0h4bnNJcnBaZlEAAAAAAAAGKxZJRnJ1V1RPMFJQU29IeG5zSXJwWmZRAAAAAAAABioWSUZydVdUTzBSUFNvSHhuc0lycFpmUQAAAAAAAAYsFklGcnVXVE8wUlBTb0h4bnNJcnBaZlEAAAAAAAAGKRZJRnJ1V1RPMFJQU29IeG5zSXJwWmZR", "_shards": {"total": 5, "skipped": 0, "successful": 5, "failed": 0}, "hits": {"total": 1, "max_score": 1.0, "hits": [{"_type": "ariseadustik", "_id": "3DrvvGgBLr6SwCRtWKxk", "_index": "ariseadustik", "_score": 1.0, "_source": {"text": "V\u00e4ljaandja: Riigikogu\nAkti liik: seadus\n...

second response - now inside while loop (notice no hits):
{"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAYoFklGcnVXVE8wUlBTb0h4bnNJcnBaZlEAAAAAAAAGKxZJRnJ1V1RPMFJQU29IeG5zSXJwWmZRAAAAAAAABioWSUZydVdUTzBSUFNvSHhuc0lycFpmUQAAAAAAAAYsFklGcnVXVE8wUlBTb0h4bnNJcnBaZlEAAAAAAAAGKRZJRnJ1V1RPMFJQU29IeG5zSXJwWmZR", "_shards": {"total": 5, "skipped": 0, "successful": 5, "failed": 0}, "hits": {"total": 1, "max_score": 1.0, "hits": []}, "timed_out": false, "took": 1}}

Attached example datafile.
120122018002.txt

from texta.

ranetp commented on July 20, 2024

This should now be fixed, but make sure that "Frequency threshold", in the parameters you choose when starting a Language Model Task, is low enough to fit your dataset

from texta.

Error training language model about texta HOT 5 CLOSED

Comments (5)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent