Code Monkey home page Code Monkey logo

pyeunjeon's Introduction



Koshort

한국어 인터넷 트렌드 스트리밍과 처리를 위하여 만들어진 파이썬 패키지 코숏
koshort is a Python package for Korean internet trends streaming and processing... or maybe Korean domestic cat 🐱


Installation

The easiest way to install the latest version is by using pip/easy_install to pull it from PyPI:

pip install koshort

You may also use Git to clone the repository from GitHub and install it manually:

git clone https://github.com/koshort/koshort.git
cd koshort
python setup.py install

Python 3.3, 3.4, 3.5 & 3.6 are supported.
3.7 compatibility soon will be available.

Examples

Use out-of-box script

$ stream_naver
display_rank = False
filename = trends.txt
interval = 60
n_limits = 10
verbose = True
시크릿 마더
무법변호사
신아영
미얀마
로드fc
소진
위너
불후의명곡
그것이 알고싶다
짠내투어
아는형님
로또
로또806회
msi
전지적 참견 시점
김재훈
아이돌룸
토익
아오르꺼러
같이 살래요

Use koshort API

$ python
>>> from koshort.stream import naver
>>> streamer = naver.NaverStreamer()
>>> streamer.stream()
cj채용
온주완의 뮤직쇼
유상무
현대차
...

Copyright

copyright by 2018 Nyanye with 💜

pyeunjeon's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pyeunjeon's Issues

Mac - virtualenv(가상환경)에서 Mecab 경로 셋팅

bash 로 먼저 mecab 를 설치 후, 파이썬 virtualenv 상에서 pip install eunjeon 을 설치했습니다.
그리고 나서, jupyter notebook 에서 실행하려 하니, 경로에 대해 다시 설정하라고 하더군요....

가상환경에 설치된 은전 라이브러리의 macabrc 파일을 확인해보니, 설치가 되어있습니다. 이러한 오류는 어떻게 해결해야하는지요?;;

아래는 에러 로그 입니다.

    104                 self.tagger = Tagger('--rcfile %s' % dicpath)
    105             except RuntimeError:
--> 106                 raise Exception('The MeCab dictionary does not exist at "%s". Is the dictionary correctly installed?\nYou can also try entering the dictionary path when initializing the Mecab class: "Mecab(\'/some/dic/path\')"' % dicpath)
    107         except NameError:
    108             raise Exception('Install MeCab in order to use it: https://github.com/koshort/pyeunjeon/')

Exception: The MeCab dictionary does not exist at "/Users/codeblock/.virtualenvs/notebooks/lib/python3.7/site-packages/eunjeon/data/mecabrc". Is the dictionary correctly installed?
You can also try entering the dictionary path when initializing the Mecab class: "Mecab('/some/dic/path')"```

mecab-ko-dic 은 지원 안되나요?

mecab-ko-dic 을 이용해서, 사용자 사전 추가는 안 되나요?

ubuntu 환경에서 mecab-ko-dic 을 설치하여, mecab/dic/mecab-ko-dic 이하만 윈도우로 복사하여 두고, 해당 패쓰를 Mecab(dicpath=mypath) 로 실행해 보았으나, 아래와 같은 에러메시지가 발생하네요. 안되는 건가요? 무언가를 잘못한걸까요?

>>> from konlpy.tag import Mecab as kmecab
>>> m1 = Mecab(dicpath='d:/sh_channel/mecab__dic__mecab-ko-dic')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\pythonenvs\textanal3664\lib\site-packages\eunjeon\_mecab.py", line 106, in __init__
    raise Exception('The MeCab dictionary does not exist at "%s". Is the dictionary correctly installed?\nYou can also try entering the dictionary path when initializing the Mecab class: "Mecab(\'/some/dic/path\')"' % dicpath)
Exception: The MeCab dictionary does not exist at "d:/sh_channel/mecab__dic__mecab-ko-dic". Is the dictionary correctly installed?

윈도우 설치용으로

C:\python_test\mecab_test> pip install eunjeon

가동했는데 mecab은 설치된 상태입니다.

PS C:\python_test\mecab_test> pip install eunjeon
Collecting eunjeon
Using cached https://files.pythonhosted.org/packages/68/90/3232725f974abf6d38f1e2cfd7a6b958337133b3fdc5b3e8994e03d7c2d3/eunjeon-0.4.0.tar.gz
Installing collected packages: eunjeon
Running setup.py install for eunjeon ... error
ERROR: Complete output from command 'c:\python37\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\ADMINI1\AppData\Local\Temp\pip-install-w5zs53ch\eunjeon\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\ADMINI1\AppData\Local\Temp\pip-record-m05v6c9\install-record.txt' --single-version-externally-managed --compile:
ERROR: running install
running build
running build_py
creating build
creating build\lib.win32-3.7
creating build\lib.win32-3.7\eunjeon
copying eunjeon\constants.py -> build\lib.win32-3.7\eunjeon
copying eunjeon\mecab.py -> build\lib.win32-3.7\eunjeon
copying eunjeon_mecab.py -> build\lib.win32-3.7\eunjeon
copying eunjeon_init
.py -> build\lib.win32-3.7\eunjeon
creating build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\char.bin -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\matrix.bin -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\model.bin -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\char.def -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\feature.def -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\left-id.def -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\pos-id.def -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\rewrite.def -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\right-id.def -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\unk.def -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\sys.dic -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\unk.dic -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\dicrc -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\mecabrc -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\libmecab.dll -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\mecab-cost-train.exe -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\mecab-dict-gen.exe -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\mecab-dict-index.exe -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\mecab-system-eval.exe -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\mecab-test-gen.exe -> build\lib.win32-3.7\eunjeon\data
copying eunjeon\data\mecab.exe -> build\lib.win32-3.7\eunjeon\data
creating build\lib.win32-3.7\eunjeon\data\sdk
copying eunjeon\data\sdk\libmecab.lib -> build\lib.win32-3.7\eunjeon\data\sdk
copying eunjeon\data\sdk\mecab-cost-train.lib -> build\lib.win32-3.7\eunjeon\data\sdk
copying eunjeon\data\sdk\mecab-dict-gen.lib -> build\lib.win32-3.7\eunjeon\data\sdk
copying eunjeon\data\sdk\mecab-dict-index.lib -> build\lib.win32-3.7\eunjeon\data\sdk
copying eunjeon\data\sdk\mecab-system-eval.lib -> build\lib.win32-3.7\eunjeon\data\sdk
copying eunjeon\data\sdk\mecab-test-gen.lib -> build\lib.win32-3.7\eunjeon\data\sdk
copying eunjeon\data\sdk\mecab.h -> build\lib.win32-3.7\eunjeon\data\sdk
copying eunjeon\data\sdk\mecab.lib -> build\lib.win32-3.7\eunjeon\data\sdk
running build_ext
building '_MeCab' extension
creating build\temp.win32-3.7
creating build\temp.win32-3.7\Release
creating build\temp.win32-3.7\Release\eunjeon
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.21.27702\bin\HostX86\x86\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Ieunjeon/data/sdk -Ic:\python37\include -Ic:\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.21.27702\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.21.27702\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpeunjeon/MeCab_wrap.cxx /Fobuild\temp.win32-3.7\Release\eunjeon/MeCab_wrap.obj
MeCab_wrap.cxx
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.21.27702\bin\HostX86\x86\link.exe /nologo /INCREMENTAL:NO /LTCG /nodefaultlib:libucrt.lib ucrt.lib /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:eunjeon/data/sdk /LIBPATH:c:\python37\libs /LIBPATH:c:\python37\PCbuild\win32 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.21.27702\ATLMFC\lib\x86" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.21.27702\lib\x86" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x86" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x86" libmecab.lib /EXPORT:PyInit__MeCab build\temp.win32-3.7\Release\eunjeon/MeCab_wrap.obj /OUT:build\lib.win32-3.7_MeCab.cp37-win32.pyd /IMPLIB:build\temp.win32-3.7\Release\eunjeon_MeCab.cp37-win32.lib
build\temp.win32-3.7\Release\eunjeon_MeCab.cp37-win32.lib 라이브러리 및 build\temp.win32-3.7\Release\eunjeon_MeCab.cp37-win32.exp 개체를 생성하고 있습니다.
MeCab_wrap.obj : error LNK2001: "__declspec(dllimport) class MeCab::Tagger * __cdecl MeCab::createTagger(char const *)" (_imp?createTagger@MeCab@@YAPAVTagger@1@PBD@Z) 외부 기호를 확인할 수 없습니다.
MeCab_wrap.obj : error LNK2001: "__declspec(dllimport) class MeCab::Lattice * __cdecl MeCab::createLattice(void)" (_imp?createLattice@MeCab@@YAPAVLattice@1@XZ) 외부 기호를 확인할 수 없습니다.
MeCab_wrap.obj : error LNK2001: "__declspec(dllimport) class MeCab::Model * __cdecl MeCab::createModel(char const *)" (_imp?createModel@MeCab@@YAPAVModel@1@PBD@Z) 외부 기호를 확인할 수 없습니다.
MeCab_wrap.obj : error LNK2001: "__declspec(dllimport) char const * __cdecl MeCab::getLastError(void)" (_imp?getLastError@MeCab@@YAPBDXZ) 외부 기호를 확인할 수 없습니다.
MeCab_wrap.obj : error LNK2001: "public: static char const * __cdecl MeCab::Tagger::version(void)" (?version@Tagger@MeCab@@SAPBDXZ) 외부 기호를 확인할 수 없습니다.
MeCab_wrap.obj : error LNK2001: "public: static bool __cdecl MeCab::Tagger::parse(class MeCab::Model const &,class MeCab::Lattice *)" (?parse@Tagger@MeCab@@SA_NABVModel@2@PAVLattice@2@@z) 외부 기호를 확인할 수 없습
니다.
MeCab_wrap.obj : error LNK2001: "public: static char const * __cdecl MeCab::Model::version(void)" (?version@Model@MeCab@@SAPBDXZ) 외부 기호를 확인할 수 없습니다.
build\lib.win32-3.7_MeCab.cp37-win32.pyd : fatal error LNK1120: 7개의 확인할 수 없는 외부 참조입니다.
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.21.27702\bin\HostX86\x86\link.exe' failed with exit status 1120
----------------------------------------
ERROR: Command "'c:\python37\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\ADMINI1\AppData\Local\Temp\pip-install-w5zs53ch\eunjeon\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\ADMINI1\AppData\Local\Temp\pip-record-_m05v6c9\install-record.txt' --single-version-externally-managed --compile" failed with error code 1 in C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-w5zs53ch\eunjeon\

밑에 처럼 장애가 생깁니다.

64bit 입니다.

토크나이즈 이슈입니다

안녕하세요

윈도우 10 환경에서 eunjeon으로 Mecab을 사용중입니다.
코퍼스 분석 중에 에러가 발생하는 것을 확인하고, 글 남깁니다.

캘리 에듀 라는 단어가, 뒤에 공백이 올 경우 공백까지 포함하여 토크나이즈가 됩니다.
즉, 캘리 라는 토큰과 캘리 라는 토큰이 각각 존재합니다. 에듀도 마찬가지구요.
아래 스크린샷 확인해주세요.

# tokenizer = Mecab()

sent1 = [x for (x,y) in tokenizer.pos('고구마참치비빔밥')]
sent1
>>> ['고구마', '참치', '비빔밥']
sent2 = [x for (x,y) in tokenizer.pos('고구마  참치  비빔밥')]
sent2
>>> ['고구마', '참치', '비빔밥']
sent1 = [x for (x,y) in tokenizer.pos('캘리참치비빔밥')]
sent1
>>> ['캘리', '참치', '비빔밥']
sent2 = [x for (x,y) in tokenizer.pos('캘리  참치  비빔밥')]
sent2
>>> ['캘리 ', '참치', '비빔밥']
sent3 = [x for (x,y) in tokenizer.pos('고구마  캘리  비빔밥')]
sent3
>>> ['고구마', '캘리 ', '비빔밥']
sent1[0] == sent2[0]
>>> False   # '캘리' != '캘리 '
sent2[0] == sent3[1]
>>> True
sent1[0] == sent2[0].strip()
>>> True   # 공백을 제거하면 해결
sent3[1][-1] == ' '  # 마지막에 덧붙는 것이 공백인지 확인
>>> True


sent1 = [x for (x,y) in tokenizer.pos('고구마에듀비빔밥')]
sent1
>>> ['고구마', '에듀', '비빔밥']
sent2 = [x for (x,y) in tokenizer.pos('고구마 에듀 비빔밥')]
sent2
>>> ['고구마', '에듀 ', '비빔밥']
sent1[1] == sent2[1]
>>> False  # '에듀' != '에듀 '
sent1[1] == sent2[1].strip()
>>> True
sent2[1][-1] == ' '  # 마지막에 덧붙는 것이 공백인지 확인
>>> True
sent3 = [x for (x,y) in tokenizer.pos('고구마초코비빔밥')]
sent3
>>> ['고구마', '초코', '비빔밥']
sent4 = [x for (x,y) in tokenizer.pos('고구마 초코 비빔밥')]
sent4
>>> ['고구마', '초코', '비빔밥']

제가 사용한 코퍼스 안에서 저렇게 나타난것이지, 다른 오작동 토큰들이 있을지도 모릅니다.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.