Code Monkey home page Code Monkey logo

ccf-bdci-sentiment-analysis-baseline's Introduction

CCF-BDCI-Sentiment-Analysis-Baseline

1.从该开源代码中改写的

2.该模型将文本截成k段,分别输入语言模型,然后顶层用GRU拼接起来。好处在于设置小的max_length和更大的k来降低显存占用,因为显存占用是关于长度平方级增长的,而关于k是线性增长的

模型 线上F1
Bert-base 80.3
Bert-wwm-ext 80.5
XLNet-base 79.25
XLNet-mid 79.6
XLNet-large --
Roberta-mid 80.5
Roberta-large (max_seq_length=512, split_num=1) 81.25

注:

1)实际长度 = max_seq_length * split_num

2)实际batch size 大小= per_gpu_train_batch_size * numbers of gpu

3)上面的结果所使用的是4卡GPU,因此batch size为4。如果只有1卡的话,那么per_gpu_train_batch_size应设为4, max_length设置小一些。

4)如果显存太小,可以设置gradient_accumulation_steps参数,比如gradient_accumulation_steps=2,batch size=4,那么就会运行2次,每次batch size为2,累计梯度后更新,等价于batch size=4,但速度会慢两倍。而且迭代次数也要相应提高两倍,即train_steps设为10000

具体batch size可看运行时的log,如:

09/06/2019 21:03:41 - INFO - main - ***** Running training *****

09/06/2019 21:03:41 - INFO - main - Num examples = 5872

09/06/2019 21:03:41 - INFO - main - Batch size = 4

09/06/2019 21:03:41 - INFO - main - Num steps = 5000

赛题说明

请查看该网站了解赛题

下载数据集

从该网站中下载数据集, 并解压在./data目录。

数据预处理

cd data
python preprocess.py
cd ..

Bert-base 模型

bash run_bert.sh
#5 fold取平均
python combine.py --model_prefix ./model_bert --out_path ./sub.csv

Bert Whole Word Masking 模型

从该网站下载pytorch权重,并解压到chinese_wwm_ex_bert目录下: https://github.com/ymcui/Chinese-BERT-wwm

bash run_bert_wwm_ext.sh
python combine.py --model_prefix ./model_bert_wwm_ext --out_path ./sub.csv

XLNet-mid 模型

从该网站下载pytorch权重,并解压到./chinese_xlnet_mid/目录下: https://github.com/ymcui/Chinese-PreTrained-XLNet

bash run_xlnet.sh
python combine.py --model_prefix ./model_xlnet --out_path ./sub.csv

Roberta-mid 模型

从该网站下载tensorflow版本的权重,并解压到./chinese_roberta/目录下: https://github.com/brightmart/roberta_zh

mv chinese_roberta/bert_config_middle.json chinese_roberta/config.json
python -u -m pytorch_transformers.convert_tf_checkpoint_to_pytorch --tf_checkpoint_path chinese_roberta/ --bert_config_file chinese_roberta/config.json --pytorch_dump_path chinese_roberta/pytorch_model.bin
bash run_roberta.sh
python combine.py --model_prefix ./model_roberta --out_path ./sub.csv

ccf-bdci-sentiment-analysis-baseline's People

Contributors

dependabot[bot] avatar guoday avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ccf-bdci-sentiment-analysis-baseline's Issues

run_bert.py eval_loss计算错误

raw:
raw
changed:
changed
训练时,发现eval_loss的曲线太奇怪了,后来发现原版的pytorch_transformer把eval_loss的计算放在了
with torch.no_grad 里,于是修改代码为:

with torch.no_grad():
    tmp_eval_loss= model(input_ids=input_ids, token_type_ids=segment_ids, attention_mask=input_mask, labels=label_ids)
    logits = model(input_ids=input_ids, token_type_ids=segment_ids, attention_mask=input_mask)
    eval_loss += tmp_eval_loss.mean().item()

然后就正常了,不知道是不是这个原因?

/home/ming/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:1439: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples. 'recall', 'true', average, warn_for) test 0.06457949662369551

/home/ming/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:1439: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
'recall', 'true', average, warn_for)
test 0.06457949662369551
F1正常,test值低,而且出现这样的报错,寻求许久,未解决,请问这是因为什么?

XLNet_zh_Large上的效果

可以测试对比一下在XLNet_zh_Large上的效果吗?
(目前的XLNet_zh_Large是尝鲜版,如有问题会协助解决)

运行robera-english报错的问题

哈喽。大佬。
我想运行robera-english,调用了pytorch_transformers中的RobertaForSequenceClassification, RobertaConfig,RobertaTokenizer。
但是会报错,RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
这个是什么原因。
好奇,感谢大佬解答。

请教文章截成的k段,在哪块代码出可以看出是分别输入模型处理?

求解释下一个文章截成k端后怎么输入训练的,没有找到是在哪个地方“分别输入语言模型”的?如果是这样,理论上是不是不管多长的文章都可以通过切成很多端,分别输入处理了,不用截断文章了
BertForSequenceClassification中的forward:
def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None,
position_ids=None, head_mask=None):

    flat_input_ids = input_ids.view(-1, input_ids.size(-1))
    flat_position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None
    flat_token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None

flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
比如k=2,input_ids中就包含文章划分的两端,通过view又展平了,那输入的长度还是没有变短?和不划分一样?

为什么robert.sh 用的是run_bert.py呢

你好,

请问为什么robert.sh 用的是run_bert.py呢
是因为roberta的模型可以在bert上finetune吗?
另外我尝试用albert的预训练模型在run_bert.py上跑,结果显示torch
size 不匹配, 一个是[21128, 2048],一个是(21128, 128)

咨询一个问题:这些模型无显卡的笔记本能跑嘛?

其实知道跑是可以跑的,只是花费的时间问题,因为实在没有这么多计算资源,很多模型都不尝试,直接放弃,kaggle上有 gpu资源,想回头去尝试一下,因此还是想咨询你一下,你这个模型跑完多久?

您好,既然test集的全是假标签0为什么还要导进去dataloader里面?

image
在昨天在运行过程中遇到一个坑,就是pytorch的标签不能为负数,不知道是不是这个原因,所以要显式的规定test集的标签?能否直接用没有标签的数据直接预测。(另外,昨天发现,如果下载的是pytorch版本的权重,就不需要进行转换了,直接bash run_roberta.sh 即可。)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.