moneydboat / data_grand Goto Github PK

View Code? Open in Web Editor NEW

216.0 216.0 85.0 256 KB

2018达观杯文本智能处理挑战赛 Top10解决方案（10/3830）

Python 87.64% Shell 12.36%

data_grand's People

Contributors

Stargazers

Watchers

Forkers

reganzm pylxtu as472780551 tiffen maeleven kobecjy mr-lz pieere zouxiaoyuonly huasanyelao parety chizhu kinnaro yongqiangning xikunlun001 qiuguo chuanfanyoudong hust-bxc small-persimmon digapieceofday xiang-do lfsblack zzw12138 allensmile kalengit ttjjlw huihuid dl-talent2g berryhn foreseez ifeynman jkhlot btujack langzhining awesome-archive ssyygam xiaoxiaobear linenus pengyulong a2393439531 lufenggui tanshoudong moonlight1776 xw-jia hcz-110 wealthe jackjet jnulyd xinsen-zhang foochane fanfanxiaoer forkcollections yanghedada zakra zaruker josie0921 jinpeijie217 jybrave nlp918 marchboy m5218 hatrix233 chenny0808 onlylonelyw lcyuanjiang luckqq gptcod 1637mishenlan forechoandlook i-zhangjingjun xiaohua123456 kovnew stella2019 vvvictorlee brucekyle99 cornann kjysu 2816569402 qixi-art bise86

data_grand's Issues

关于torchtext中Vectors的一点疑问

我在运行main函数时，在加载数据部分会报如下错误：no vectors found at .vector_cache/datasets/emb/word_300.txt
我对data.py的load_data方法中，Vectors的使用有一点不太理解，查找了一些资料也没有找到相应的解释，代码中有如下两行代码
cache = '.vector_cache'
vectors = Vectors(name=embedding_path, cache=cache)

如果我的embedding_path的目录为 'datasets/emb'
那么我需要将word2vec预训练好的词向量放到datasets/emb中还是.vector_cache/datasets/emb中？

我想请教一下截断的长度是通过什么定下来的？

val_set.csv

split_val.py 里面也没写val_set.csv是怎么生成的啊。是需要自己再写一个生成验证的代码吗？

请教个问题

您好，萌新问个问题。

大概看了一遍代码，请问划分后的原始数据是否已经做了截断和padding？但是在后面的模型部分，并没有看到对padding（比如Attention的softmax，LSTM的hn）的单独处理，请问是影响不大还是什么？

训练word2vec词向量

请问我使用 tran_emb.py 进行词向量训练时报错，是因为什么呢？

data.py里load_data方法里为什么对数据路径定义了两次？

def load_data(opt):
# 不设置fix_length
TEXT = data.Field(sequential=True, fix_length=opt.max_text_len) # 词或者字符
LABEL = data.Field(sequential=False, use_vocab=False)

# load
# word/ or article/
train_path = opt.data_path + opt.text_type + '/train_set.csv'
val_path = opt.data_path + opt.text_type + '/val_set.csv'
test_path = opt.data_path + opt.text_type + '/test_set.csv'
train_path = 'D:/git/dataset/val_set.csv'
test_path = 'D:/git/dataset/val_set.csv'
val_path = 'D:/git/dataset/val_set.csv'

关于TextCNN模型

请问这里的一维MaxPool1d中的kernel_size是怎么设计的？不是应该是opt.max_text_len -kernel_size +1吗？

val_set.csv

关于kmax_pooling

请问一下关于rnn中kmax_pooling的用法目前用的多吗，如果不进行这步操作，直接在out = self.bilstm(embed)[0].permute(1, 2, 0)这一步中直接取最后一个时间步？

能否提供单条样本测试的代码

作者，能否提供下单条数据测试用的代码，感谢

关于import word2vec的问题

请问这个word2vec，是你自己写的包吗，因为我直接pip install word2vec 安装失败。
我看你给的word2vec的链接，是个github，我下载了里面的Wordvec，但是没有找到load属性（因为我看到上面有word2vec.load(path)）,谢谢解疑

ValueError: Fan in and fan out can not be computed for tensor with less than 2 dimensions

请问这个是什么原因啊，在TEXT.build_vocab(train, vectors=vectors)这一句报错了。
我看了下vectors的格式也正确，pytorch0.4.1版本。暂时不知道解决办法，求问，谢谢。

找到ShuangXieIrene/ssds.pytorch#9 ，但并没有解决问题。

报错代码：
270it [00:00, 17380.78it/s]<torchtext.vocab.Vectors object at 0x7f80a89e4cf8>
read data from /home/lxy/new20g_disk/3a_research/pytorch_learning/util/word/train_set.csv

Traceback (most recent call last):

File "", line 1, in
runfile('/home/lxy/new20g_disk/3a_research/pytorch_Test/vectors.py', wdir='/home/lxy/new20g_disk/3a_research/pytorch_Test')

File "/home/lxy/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile
execfile(filename, namespace)

File "/home/lxy/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/lxy/new20g_disk/3a_research/pytorch_Test/vectors.py", line 79, in
TEXT.build_vocab(train, vectors=vectors)

File "/home/lxy/anaconda3/lib/python3.5/site-packages/torchtext/data/field.py", line 273, in build_vocab
self.vocab = self.vocab_cls(counter, specials=specials, **kwargs)

File "/home/lxy/anaconda3/lib/python3.5/site-packages/torchtext/vocab.py", line 88, in init
self.load_vectors(vectors, unk_init=unk_init, cache=vectors_cache)

File "/home/lxy/anaconda3/lib/python3.5/site-packages/torchtext/vocab.py", line 159, in load_vectors
self.vectors[i][start_dim:end_dim] = v[token.strip()]

File "/home/lxy/anaconda3/lib/python3.5/site-packages/torchtext/vocab.py", line 286, in getitem
return self.unk_init(torch.Tensor(self.dim))

File "/home/lxy/anaconda3/lib/python3.5/site-packages/torch/nn/init.py", line 218, in xavier_uniform_
fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)

File "/home/lxy/anaconda3/lib/python3.5/site-packages/torch/nn/init.py", line 181, in _calculate_fan_in_and_fan_out
raise ValueError("Fan in and fan out can not be computed for tensor with less than 2 dimensions")