While using the library, the word statistics file is again missing from its original s

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Word Statistics File not Found. | Receiving 404 error while dowloading the file. about ekphrasis HOT 17 CLOSED

cbaziotis commented on June 30, 2024 14

Word Statistics File not Found. | Receiving 404 error while dowloading the file.

from ekphrasis.

Comments (17)

Nick18899 commented on June 30, 2024 8

We are going to have a deploy of project on Saturday, and the tokenizer has fallen!!! Please, repair it quickly!

from ekphrasis.

zahrahnnx commented on June 30, 2024 2

Same error, May you please help to fix it?

from ekphrasis.

asmhack commented on June 30, 2024 2

Uncompress and put that folder into home dir.
So should be: ~/.ekphrasis/stats/...

https://we.tl/t-hwj94h9MMJ

from ekphrasis.

fucaja commented on June 30, 2024 2

I solved the problem changing the url on helpers.py adding a new link to a repository of the stats files

!pip install git+https://github.com/fucaja/ekphrasis.git

from ekphrasis.

yistarostin commented on June 30, 2024 1

Hi @yistarostin, few observations I've made from your message:

That zip contains following folders (see the screenshot) (and twitter is included).

You have to unzip it. It should be a folder not a zip file.

In Google Colab the home directory is /root. So please carefully check if those files are available there. It should looks like /root/.ekphrasis/stats/{and here folders from screen bellow}

Well, I re-made your steps and it worked! I guess I accidentally unzipped to /content instead of /root. Thank you and Spasibo!

from ekphrasis.

shihabshahriar16 commented on June 30, 2024

same issue here!

from ekphrasis.

jeremy-yuan07 commented on June 30, 2024

Same error, waiting the solution. Thanks in advance.

from ekphrasis.

yistarostin commented on June 30, 2024

Uncompress and put that folder into home dir. So should be: ~/.ekphrasis/stats/...

https://we.tl/t-hwj94h9MMJ

Hi. Thank you, with you advice I managed to fix the mentioned problem, but how there is a new one:
I am using tokenizer for twitter with following flags:

text_processor = TextPreProcessor(
    normalize=['url', 'email', 'percent', 'money', 'phone', 'user',
        'time', 'url', 'date', 'number'],
    annotate={"hashtag",# "allcaps", 
              "elongated", "repeated",
        'emphasis', 'censored'},
    fix_html=True,  # fix HTML tokens
    segmenter="twitter", 
    corrector="twitter", 
    #unpack_hashtags=True,  # perform word segmentation on hashtags
    unpack_contractions=True,  # Unpack contractions (can't -> can not)
    spell_correct_elong=False,  # spell correction for elongated words
    tokenizer=SocialTokenizer(lowercase=True).tokenize,
    dicts=[emoticons]
  )

And now it says:

---TOKENIZING TWEETS NOW---
Reading twitter - 1grams ...
stats file not available!
An exception has occurred, use %tb to see the full traceback.

SystemExit: 1
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2890: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

Maybe the ZIP you provided doesn't have necessary archive for tokenizing twitter?
By the way, I am ytring to make it work in Google Colab, if it is important.

from ekphrasis.

asmhack commented on June 30, 2024

Hi @yistarostin,
few observations I've made from your message:

That zip contains following folders (see the screenshot) (and twitter is included).
You have to unzip it. It should be a folder not a zip file.
In Google Colab the home directory is /root. So please carefully check if those files are available there. It should looks like /root/.ekphrasis/stats/{and here folders from screen bellow}

from ekphrasis.

fucaja commented on June 30, 2024

Hi @yistarostin, I am new to using github, could you explain how it worked for you?

Tried using !git clone https://github.com/cbaziotis/ekphrasis.git in /root/ folder in colab (see the screenshot). How can I use the library?

from ekphrasis.

yistarostin commented on June 30, 2024

@fucaja Hi.
To use this and all other modules, you need to install that. For instance, to install this module ekphrasis, you need to simply do pip install ekphrasis from terminal, or !pip install ekphrasis (the same with exclamation mark) from python code. Technically, you can clone the repository, %cd to the folder of the repositry and then do !pip install -e, but this is a really weird way to install, as you need to know the full URL to the repository to clone it. For instance, if the repository would get moved to another Git hosting platform, you code would just stop working.
So, to install any repository, just do !pip install [module name]
To use this library, do

import [module name]

in your python code
For instance, this module includes several classes, to use them do:

from ekphrasis.classes.preprocessor import TextPreProcessor
from ekphrasis.classes.tokenizer import SocialTokenizer
from ekphrasis.dicts.emoticons import emoticons

Full example is listed in the README.md of repo (on the front page)

from ekphrasis.

fucaja commented on June 30, 2024

Hi @yistarostin.

Using !pip install I don't know where I should add the stats files in colab. Could you explain me? Thanks in advance

from ekphrasis.

yistarostin commented on June 30, 2024

@fucaja As advised before, you need to put ekphrasis dictionary files to /root/ekphrasis. In normal circumstances, it is performed automatically, but somehow it is now broken, that is why we are here in this issue. So, you need to manually download .zip archive from the link mentioned in previous comments, than upload this file to Colab to /root folder, then change directory to /root, and than unzip the archive.

from ekphrasis.

ycchanau commented on June 30, 2024

still get the same error. Already fixed?

Word statistics files not found!
Downloading...

from ekphrasis.

frankniujc commented on June 30, 2024

Here's a version of my ~/.ekphrasis from an old installation:
https://utoronto-my.sharepoint.com/:u:/g/personal/frank_niu_mail_utoronto_ca/Ed0k1JhgN8JJjmVxaBR_OzsBpMGlhhslAE9h3apvY9I_lA?e=tyZ7Nz

Unzip it and put home/frank/.ekphrasis at ~/.ekphrasis should solve the problem.

Notice that my link is also not permanent (limited by my university's onedrive sharepoint policy). Hopefully this issue can be properly patched before the link expired.

from ekphrasis.

ArlanCooper commented on June 30, 2024

Uncompress and put that folder into home dir. So should be: ~/.ekphrasis/stats/...

https://we.tl/t-hwj94h9MMJ

the original url has expired, can you make another new url to download the dataset, thanks

from ekphrasis.

cbaziotis commented on June 30, 2024

Initially, I used my personal dropbox account to host the file as only some friends and I were using the library. It turns out that dropbox has suspended my public links for generating excessive traffic...

I moved the data to another server and updated the public link for the stats.zip file. Please, ppdate the package and try again.

build from source

pip install git+git://github.com/cbaziotis/ekphrasis.git

or install from pypi

pip install ekphrasis -U

FYI the link is https://data.statmt.org/cbaziotis/projects/ekphrasis/stats.zip

Let me know if it works now.

from ekphrasis.

Word Statistics File not Found. | Receiving 404 error while dowloading the file. about ekphrasis HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent