Code Monkey home page Code Monkey logo

Comments (17)

Nick18899 avatar Nick18899 commented on June 30, 2024 8

We are going to have a deploy of project on Saturday, and the tokenizer has fallen!!! Please, repair it quickly!

from ekphrasis.

zahrahnnx avatar zahrahnnx commented on June 30, 2024 2

Same error, May you please help to fix it?

from ekphrasis.

asmhack avatar asmhack commented on June 30, 2024 2

Uncompress and put that folder into home dir.
So should be: ~/.ekphrasis/stats/...

https://we.tl/t-hwj94h9MMJ

from ekphrasis.

fucaja avatar fucaja commented on June 30, 2024 2

I solved the problem changing the url on helpers.py adding a new link to a repository of the stats files

!pip install git+https://github.com/fucaja/ekphrasis.git

from ekphrasis.

yistarostin avatar yistarostin commented on June 30, 2024 1

Hi @yistarostin, few observations I've made from your message:

  1. That zip contains following folders (see the screenshot) (and twitter is included).
  2. You have to unzip it. It should be a folder not a zip file.
  3. In Google Colab the home directory is /root. So please carefully check if those files are available there. It should looks like /root/.ekphrasis/stats/{and here folders from screen bellow}
Screenshot 2021-10-07 at 12 31 39

Well, I re-made your steps and it worked! I guess I accidentally unzipped to /content instead of /root. Thank you and Spasibo!

from ekphrasis.

shihabshahriar16 avatar shihabshahriar16 commented on June 30, 2024

same issue here!

from ekphrasis.

jeremy-yuan07 avatar jeremy-yuan07 commented on June 30, 2024

Same error, waiting the solution. Thanks in advance.

from ekphrasis.

yistarostin avatar yistarostin commented on June 30, 2024

Uncompress and put that folder into home dir. So should be: ~/.ekphrasis/stats/...

https://we.tl/t-hwj94h9MMJ

Hi. Thank you, with you advice I managed to fix the mentioned problem, but how there is a new one:
I am using tokenizer for twitter with following flags:

text_processor = TextPreProcessor(
    normalize=['url', 'email', 'percent', 'money', 'phone', 'user',
        'time', 'url', 'date', 'number'],
    annotate={"hashtag",# "allcaps", 
              "elongated", "repeated",
        'emphasis', 'censored'},
    fix_html=True,  # fix HTML tokens
    segmenter="twitter", 
    corrector="twitter", 
    #unpack_hashtags=True,  # perform word segmentation on hashtags
    unpack_contractions=True,  # Unpack contractions (can't -> can not)
    spell_correct_elong=False,  # spell correction for elongated words
    tokenizer=SocialTokenizer(lowercase=True).tokenize,
    dicts=[emoticons]
  )

And now it says:

---TOKENIZING TWEETS NOW---
Reading twitter - 1grams ...
stats file not available!
An exception has occurred, use %tb to see the full traceback.

SystemExit: 1
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2890: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

Maybe the ZIP you provided doesn't have necessary archive for tokenizing twitter?
By the way, I am ytring to make it work in Google Colab, if it is important.

from ekphrasis.

asmhack avatar asmhack commented on June 30, 2024

Hi @yistarostin,
few observations I've made from your message:

  1. That zip contains following folders (see the screenshot) (and twitter is included).
  2. You have to unzip it. It should be a folder not a zip file.
  3. In Google Colab the home directory is /root. So please carefully check if those files are available there. It should looks like /root/.ekphrasis/stats/{and here folders from screen bellow}

Screenshot 2021-10-07 at 12 31 39

from ekphrasis.

fucaja avatar fucaja commented on June 30, 2024

Hi @yistarostin, I am new to using github, could you explain how it worked for you?

Tried using !git clone https://github.com/cbaziotis/ekphrasis.git in /root/ folder in colab (see the screenshot). How can I use the library?

ekp

from ekphrasis.

yistarostin avatar yistarostin commented on June 30, 2024

@fucaja Hi.
To use this and all other modules, you need to install that. For instance, to install this module ekphrasis, you need to simply do pip install ekphrasis from terminal, or !pip install ekphrasis (the same with exclamation mark) from python code. Technically, you can clone the repository, %cd to the folder of the repositry and then do !pip install -e, but this is a really weird way to install, as you need to know the full URL to the repository to clone it. For instance, if the repository would get moved to another Git hosting platform, you code would just stop working.
So, to install any repository, just do !pip install [module name]
To use this library, do

import [module name]

in your python code
For instance, this module includes several classes, to use them do:

from ekphrasis.classes.preprocessor import TextPreProcessor
from ekphrasis.classes.tokenizer import SocialTokenizer
from ekphrasis.dicts.emoticons import emoticons

Full example is listed in the README.md of repo (on the front page)

from ekphrasis.

fucaja avatar fucaja commented on June 30, 2024

Hi @yistarostin.

Using !pip install I don't know where I should add the stats files in colab. Could you explain me? Thanks in advance

ekp

from ekphrasis.

yistarostin avatar yistarostin commented on June 30, 2024

@fucaja As advised before, you need to put ekphrasis dictionary files to /root/ekphrasis. In normal circumstances, it is performed automatically, but somehow it is now broken, that is why we are here in this issue. So, you need to manually download .zip archive from the link mentioned in previous comments, than upload this file to Colab to /root folder, then change directory to /root, and than unzip the archive.

from ekphrasis.

ycchanau avatar ycchanau commented on June 30, 2024

still get the same error. Already fixed?

Word statistics files not found!
Downloading... 

from ekphrasis.

frankniujc avatar frankniujc commented on June 30, 2024

Here's a version of my ~/.ekphrasis from an old installation:
https://utoronto-my.sharepoint.com/:u:/g/personal/frank_niu_mail_utoronto_ca/Ed0k1JhgN8JJjmVxaBR_OzsBpMGlhhslAE9h3apvY9I_lA?e=tyZ7Nz

Unzip it and put home/frank/.ekphrasis at ~/.ekphrasis should solve the problem.

Notice that my link is also not permanent (limited by my university's onedrive sharepoint policy). Hopefully this issue can be properly patched before the link expired.

from ekphrasis.

ArlanCooper avatar ArlanCooper commented on June 30, 2024

Uncompress and put that folder into home dir. So should be: ~/.ekphrasis/stats/...

https://we.tl/t-hwj94h9MMJ

the original url has expired, can you make another new url to download the dataset, thanks

from ekphrasis.

cbaziotis avatar cbaziotis commented on June 30, 2024

Initially, I used my personal dropbox account to host the file as only some friends and I were using the library. It turns out that dropbox has suspended my public links for generating excessive traffic...

I moved the data to another server and updated the public link for the stats.zip file. Please, ppdate the package and try again.

build from source

pip install git+git://github.com/cbaziotis/ekphrasis.git

or install from pypi

pip install ekphrasis -U

FYI the link is https://data.statmt.org/cbaziotis/projects/ekphrasis/stats.zip

Let me know if it works now.

from ekphrasis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.