Comments (17)
We are going to have a deploy of project on Saturday, and the tokenizer has fallen!!! Please, repair it quickly!
from ekphrasis.
Same error, May you please help to fix it?
from ekphrasis.
Uncompress and put that folder into home dir.
So should be: ~/.ekphrasis/stats/...
from ekphrasis.
I solved the problem changing the url on helpers.py adding a new link to a repository of the stats files
!pip install git+https://github.com/fucaja/ekphrasis.git
from ekphrasis.
Hi @yistarostin, few observations I've made from your message:
- That zip contains following folders (see the screenshot) (and twitter is included).
- You have to unzip it. It should be a folder not a zip file.
- In Google Colab the home directory is
/root
. So please carefully check if those files are available there. It should looks like/root/.ekphrasis/stats/{and here folders from screen bellow}
![]()
Well, I re-made your steps and it worked! I guess I accidentally unzipped to /content
instead of /root
. Thank you and Spasibo!
from ekphrasis.
same issue here!
from ekphrasis.
Same error, waiting the solution. Thanks in advance.
from ekphrasis.
Uncompress and put that folder into home dir. So should be: ~/.ekphrasis/stats/...
Hi. Thank you, with you advice I managed to fix the mentioned problem, but how there is a new one:
I am using tokenizer for twitter with following flags:
text_processor = TextPreProcessor(
normalize=['url', 'email', 'percent', 'money', 'phone', 'user',
'time', 'url', 'date', 'number'],
annotate={"hashtag",# "allcaps",
"elongated", "repeated",
'emphasis', 'censored'},
fix_html=True, # fix HTML tokens
segmenter="twitter",
corrector="twitter",
#unpack_hashtags=True, # perform word segmentation on hashtags
unpack_contractions=True, # Unpack contractions (can't -> can not)
spell_correct_elong=False, # spell correction for elongated words
tokenizer=SocialTokenizer(lowercase=True).tokenize,
dicts=[emoticons]
)
And now it says:
---TOKENIZING TWEETS NOW---
Reading twitter - 1grams ...
stats file not available!
An exception has occurred, use %tb to see the full traceback.
SystemExit: 1
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2890: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
Maybe the ZIP you provided doesn't have necessary archive for tokenizing twitter?
By the way, I am ytring to make it work in Google Colab, if it is important.
from ekphrasis.
Hi @yistarostin,
few observations I've made from your message:
- That zip contains following folders (see the screenshot) (and twitter is included).
- You have to unzip it. It should be a folder not a zip file.
- In Google Colab the home directory is
/root
. So please carefully check if those files are available there. It should looks like/root/.ekphrasis/stats/{and here folders from screen bellow}
from ekphrasis.
Hi @yistarostin, I am new to using github, could you explain how it worked for you?
Tried using !git clone https://github.com/cbaziotis/ekphrasis.git in /root/ folder in colab (see the screenshot). How can I use the library?
from ekphrasis.
@fucaja Hi.
To use this and all other modules, you need to install that. For instance, to install this module ekphrasis
, you need to simply do pip install ekphrasis
from terminal, or !pip install ekphrasis
(the same with exclamation mark) from python code. Technically, you can clone the repository, %cd
to the folder of the repositry and then do !pip install -e
, but this is a really weird way to install, as you need to know the full URL to the repository to clone it. For instance, if the repository would get moved to another Git hosting platform, you code would just stop working.
So, to install any repository, just do !pip install [module name]
To use this library, do
import [module name]
in your python code
For instance, this module includes several classes, to use them do:
from ekphrasis.classes.preprocessor import TextPreProcessor
from ekphrasis.classes.tokenizer import SocialTokenizer
from ekphrasis.dicts.emoticons import emoticons
Full example is listed in the README.md of repo (on the front page)
from ekphrasis.
Hi @yistarostin.
Using !pip install I don't know where I should add the stats files in colab. Could you explain me? Thanks in advance
from ekphrasis.
@fucaja As advised before, you need to put ekphrasis dictionary files to /root/ekphrasis
. In normal circumstances, it is performed automatically, but somehow it is now broken, that is why we are here in this issue. So, you need to manually download .zip archive from the link mentioned in previous comments, than upload this file to Colab to /root folder, then change directory to /root, and than unzip the archive.
from ekphrasis.
still get the same error. Already fixed?
Word statistics files not found!
Downloading...
from ekphrasis.
Here's a version of my ~/.ekphrasis
from an old installation:
https://utoronto-my.sharepoint.com/:u:/g/personal/frank_niu_mail_utoronto_ca/Ed0k1JhgN8JJjmVxaBR_OzsBpMGlhhslAE9h3apvY9I_lA?e=tyZ7Nz
Unzip it and put home/frank/.ekphrasis
at ~/.ekphrasis
should solve the problem.
Notice that my link is also not permanent (limited by my university's onedrive sharepoint policy). Hopefully this issue can be properly patched before the link expired.
from ekphrasis.
Uncompress and put that folder into home dir. So should be: ~/.ekphrasis/stats/...
the original url has expired, can you make another new url to download the dataset, thanks
from ekphrasis.
Initially, I used my personal dropbox account to host the file as only some friends and I were using the library. It turns out that dropbox has suspended my public links for generating excessive traffic...
I moved the data to another server and updated the public link for the stats.zip file. Please, ppdate the package and try again.
build from source
pip install git+git://github.com/cbaziotis/ekphrasis.git
or install from pypi
pip install ekphrasis -U
FYI the link is https://data.statmt.org/cbaziotis/projects/ekphrasis/stats.zip
Let me know if it works now.
from ekphrasis.
Related Issues (20)
- Spell corrector in other languages
- Updation of url : https://www.dropbox.com/s/a84otqrg6u1c5je/stats.zip?dl=1 required HOT 9
- Failed during generate_stats.py
- Getting URLError: <urlopen error [Errno 60] Operation timed out> HOT 2
- The TextPreProcessor class only supports segmenting text with hastags. Required support for normal text segmenter.
- urllib.error.HTTPError: HTTP Error 429: Too Many Requests HOT 2
- Do you exposure your underlying language model for uni/bigrams?
- Segmentation: Preserve case?
- spelling correction mostly is not working
- Please add a LICENSE to this repo
- Remove one character entities on slang dictionary
- Can Ekphrasis be used in other languages? HOT 1
- "maximum recursion depth exceeded" Error HOT 1
- How can the text_processor be parelize?
- Word statistics not found.....How can I solve this error? HOT 3
- tokenizing '20th' to '2','0','th' HOT 1
- how to get the word statistics? HOT 2
- MUISTI!
- Memory usage HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ekphrasis.