The text-cnn's discuss from cjymz886

为什么 test的时候，预测结果都是同一个标签？

我训练的时候一共7个 lable 每个lable在训练集的样本数量都是5000，可以训练好进行模型测试的时候，测试出来的标签都是同一个label（比如全是手机）

请问word2vec词表中未出现的词怎么表示呢？

您好，我想请问如果待预测的文本中出现词向量表中没有的单词，是怎样表示的呢？谢谢您！

TCNN

实际上 Text-cnn在训练的时候卷积层可以使用多个不同尺寸的卷积核最后拼接到一个中间层上然后输入多个 dense 可以尝试如下操作

    # 第一卷积层
    conv_4_layer = Convolution1D(200, 4, activation='tanh')(embedding)
    # 第一池化层
    max_pool_4_layer = MaxPooling1D(4)(conv_4_layer)
    # 第一扁平层
    flat_4_layer = Flatten()(max_pool_4_layer)

    # 第二卷积层
    conv_5_layer = Convolution1D(200, 5, activation='tanh')(embedding)
    # 第二池化层
    max_pool_5_layer = MaxPooling1D(5)(conv_5_layer)
    # 第二扁平层
    flat_5_layer = Flatten()(max_pool_5_layer)

    # 第三卷积层
    conv_6_layer = Convolution1D(200, 6, activation='tanh')(embedding)
    # 第三池化层
    max_pool_6_layer = MaxPooling1D(6)(conv_6_layer)
    # 第三扁平层
    flat_6_layer = Flatten()(max_pool_6_layer)

    # 组合
    CNNs = concatenate([flat_4_layer, flat_5_layer, flat_6_layer])

Configuring CNN model...
Loading test data...
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.712 seconds.
Prefix dict has been built succesfully.
2018-10-16 01:15:17.947491: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-16 01:15:17.947539: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-16 01:15:17.947550: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Testing...
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
已經終止 (core dumped)

run text_test.py时候出现问题

报这个错的同时出现OOM,
ResourceExhaustedError: OOM when allocating tensor with shape[10000,256,1,596] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bf,
请帮忙看看,

运行text_train.py时报错

您好，非常感谢您的分享~
但我更换自己的数据集并运行train_word2vec.py重新训练词向量后，text_train.py报错显示：
ValueError: Too many elements provided. Needed at most 512000, but received 800000
后来我将text_model.py的vocab_size和loade.py中的build_vocab(filenames,vocab_dir,vocab_size=5000)均改为5000，重新训练词向量后再运行text_train.py，报错显示：
ValueError: Too many elements provided. Needed at most 320000, but received 500000
请问我该如何解决这个问题？

运行text_trian.py时程序Killed

大佬您好，我在运行text_train.py时，完成epoch:1后程序自动Killed是什么原因呢？谢谢大佬解答

请问用的是TensorFlow 哪个版本？我用的2.x，很多api都不支持了，。。谢谢。

如何预测一个没有进行类别标注的文本的类别

大佬，模型中需要用到验证集和测试集，都是标注好的。现在如果有一批没有标注的数据，如何预测每个的类别呢？

请问在生成词向量后，为何再生成词汇表，感觉有单词丢失了

word2vec训练生成的词向量对应为20万，128维度，但是之后你又重新生成一个词汇表是6000个单词的，那通过这6000生成的训练数据，对于原始句子是不是与信息丢失了啊，比如词向量与元数据中，“工商”这个词比较重要，但是词汇表里若没有，则不会出现在词汇表i，那就会影响训练效果啊，可以解释下吗？挺疑惑的

python text_test.py出現錯誤

Configuring CNN model...
Loading test data...
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.712 seconds.
Prefix dict has been built succesfully.
2018-10-16 01:15:17.947491: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-16 01:15:17.947539: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-16 01:15:17.947550: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Testing...
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
已經終止 (core dumped)

训练的时候使用的词向量是否没用到Word2Vec向量？

在text_train.py文件中：

x_train, y_train = process_file(config.train_filename, word_to_id, cat_to_id, config.seq_length)
x_val, y_val = process_file(config.val_filename, word_to_id, cat_to_id, config.seq_length)

训练数据使用的是：process_file方法，训练集x使用的是 word_to_id:get from def read_vocab()，是根据生成词表的位置索引。

请问如何使用训练好的Word2Vec向量？

运行text_test.py出现错误

============================= test session starts =============================
platform win32 -- Python 3.6.2, pytest-4.4.0, py-1.8.0, pluggy-0.9.0
rootdir: D:\lc_bs\text-cnn-mastercollected 1 item

text_test.py FLoading test data...

text_test.py:38 (test)
def test():
print("Loading test data...")
t1=time.time()

  x_test,y_test=process_file(config.test_filename,word_to_id,cat_to_id,config.seq_length)

E NameError: name 'config' is not defined

text_test.py:42: NameError
[100%]

更换训练集，训练集只有三个label，text_predict载入模型报错

更换训练集后修改了text_model中的num_classess参数为3，同时更改了loader中的label。
训练完成后使用text_predict文件载入模型报错，
saver.restore(sess=session, save_path=save_path)这里失败。
检查过save_path没有问题。。。
请问为什么呢。。。我想不出来QAQ

运行train_word2vec.py和text_train.py的时候出问题

您好，gaussic的那个我之前有运行过。之后一直在找基于Word2vec训练的词向量嵌入CNN后的模型训练。有幸发现您的项目。
不过我在使用THUCNews文本运行train_word2vec.py的时候会提示
RuntimeError: you must first build vocabulary before training the model
后下载了您训练好的vector_word.txt，运行text_train.py的时候会提示
ValueError: zero-size array to reduction operation maximum which has no identity。
想请问下您，以上问题要怎样解决呢？

cjymz886 / text-cnn Goto Github PK

text-cnn's Issues

Recommend Projects

Recommend Topics

Recommend Org