Code Monkey home page Code Monkey logo

Comments (10)

isnowfy avatar isnowfy commented on May 19, 2024

可以的,readme里的sentiment.train就是用自己的文档训练

from snownlp.

ynyxxy avatar ynyxxy commented on May 19, 2024

文档只需要给定标签就可以是吧?会利用自带的工具区分词和提取关键词再用贝叶斯训练?话说大神回的很勤啊

from snownlp.

isnowfy avatar isnowfy commented on May 19, 2024

是的,就是按照那些文本的大概的格式就可以了,因为有邮件提醒,所以看到的很快呢

from snownlp.

ynyxxy avatar ynyxxy commented on May 19, 2024

我看了一下代码。如果说不考虑更加复杂的模型和分词的准确度,应该说进行分析的难度主要是集中在特征提取上吧?话说大神有没有什么经验上的track,用什么效果会好一些

from snownlp.

isnowfy avatar isnowfy commented on May 19, 2024

可以看下 http://52opencourse.com/222/%E6%96%AF%E5%9D%A6%E7%A6%8F%E5%A4%A7%E5%AD%A6%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86%E7%AC%AC%E5%85%AD%E8%AF%BE-%E6%96%87%E6%9C%AC%E5%88%86%E7%B1%BB%EF%BC%88text-classification%EF%BC%89 http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf 有些基础的特征,这一块的话有些n-gram还有些名词,动词,识别,还有些情感词的识别,还要去对否定句做些特别处理,然后有的在用些rnn之类的model

from snownlp.

ynyxxy avatar ynyxxy commented on May 19, 2024

话说,用了N-gram以后一般是计算CHI-2或者MI然后每篇提取TOPK构建特征向量么?如果文本数量太大的话,维度爆炸有点夸张啊,一般你们是怎么解决嘞?

from snownlp.

isnowfy avatar isnowfy commented on May 19, 2024

增加的只是总维度吧,单条的维度还在可以控制的范围内,如果你觉得总维度也太大,可以考虑hash trick

from snownlp.

ynyxxy avatar ynyxxy commented on May 19, 2024

没有找到停用词功能在哪里调用,虽然看了停用词词典,未来会加入么?

from snownlp.

isnowfy avatar isnowfy commented on May 19, 2024

有停用词,可以这样用 https://github.com/isnowfy/snownlp/blob/master/snownlp/__init__.py#L61

from snownlp.

ynyxxy avatar ynyxxy commented on May 19, 2024

got it!等我自己写个情感分析的demo看看,搞完来请教

from snownlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.