mswon / sentimental-analysis Goto Github PK
View Code? Open in Web Editor NEWSentimental Analysis with Naver movie ratings
Sentimental Analysis with Naver movie ratings
안녕하세요, 원민섭님
네이버영화 감정분석 영상을 보면서 따라해 보고 있는데요
for epoch in range(30):
model.train(tokens,model.corpus_count,epochs = model.iter)
for문 실행시키면
epochs 관련해서 아래와 같은 밸류 에러가 나옵니다.
ValueError: You must specify either total_examples or total_words,
for proper job parameters updationand progress calculations.
The usual value is total_examples=model.corpus_count.
어떻게 수정하면 되는지 좀 조언 부탁 드리겠습니다.
감사합니다.
다른 텍스트파일로 코드를 따라해보았는데, 원인은 알수없지만 텐서보드를 열면 그래프는 그려져있는데, 한글이 아니라 각 점들이 숫자로 태깅되어있습니다ㅠ 혹시 해결방법을 아시나요...ㅠㅠ
WORD2VEC 의 경우에 most_similar 값도 잘 프린트되는걸로 봐서 러닝이 잘 된것같은데
텐서보드 부분이 말썽입니다ㅠㅠ
import gensim
import codecs
import os
import numpy as np
doc = open(r'C:\Users\Lab01\Desktop\JYR\jupyter\new\tweet_all_word.txt', 'r')
doc1 = doc.readlines()
total_word = []
for line in doc1:
words = line.split(' ')
final_word = []
for word in words:
final_word.append(word)
total_word.append(final_word)
tokens = total_word
model = gensim.models.Word2Vec(size=300,sg = 1, alpha=0.025,min_alpha=0.025, seed=1234)
model.build_vocab(tokens)
for epoch in range(30):
model.train(tokens,model.corpus_count,epochs = model.iter)
model.alpha -= 0.002
model.min_alpha = model.alpha
model.save('Word2vec_tweet.model')
print (model.most_similar(positive=["기후변화"], topn=30))
import tensorflow as tf
max_size = len(model.wv.vocab)-1
w2v = np.zeros((max_size,model.layer1_size))
with codecs.open(r"C:\Users\Lab01\Desktop\JYR\jupyter\new\metadata2.tsv",'w+',encoding='utf-8') as file_metadata:
for i,word in enumerate(model.wv.index2word[:max_size]):
w2v[i] = model.wv[word]
file_metadata.write(word + "\n")
from tensorflow.contrib.tensorboard.plugins import projector
sess = tf.InteractiveSession()
with tf.device("/cpu:0"):
embedding = tf.Variable(w2v, trainable = False, name = 'embedding')
tf.global_variables_initializer().run()
path = 'word2vec'
saver = tf.train.Saver()
writer = tf.summary.FileWriter(path, sess.graph)
config = projector.ProjectorConfig()
embed = config.embeddings.add()
embed.tensor_name = 'embedding'
embed.metadata_path = r'C:\Users\Lab01\Desktop\JYR\jupyter\new\metadata2.tsv'
projector.visualize_embeddings(writer, config)
saver.save(sess, path + '/model.ckpt' , global_step=max_size)
이런 기준을 여러개 둘 수 있나요?
기준1 에 대한 1, 0 값
기준2 에 대한 1, 0 값
기준3 에 대한 1, 0 값
....
현재는 0 과 1 로 트래인 값으로 학습 하는데, 부정 긍정 으로요
어떤1 가까우면 1 다르면 0
어떤2 가까우면 1 다르면 0
...
위같은 방법으로요
이런것을 여러개 기준을 둘 수 있나요?
이같은 것을 차원이라고 하나요? dimension
제가 커뮤니티에서 특정 키워드를 크롤링해서 감성분석을 진행하고 싶습니다.
선생님의 코드의 train_set을 사용해도 영화 감상평이 아닌 평소 문장도 감성분석이 가능할까요?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.