1024er / cbert_aug Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 19.0 5.1 MB

Python 100.00%

cbert_aug's People

Contributors

Stargazers

Watchers

Forkers

iiekes lcassels tongshuangwu akhileshgotmare qianrenjian chuchun8 jaeyoonchun jackkaikai yananchen1989 tanviranik yuzhang112 swayami1 scoville lingshuhu tbeehoon asiaticwormwood zzrhh 123git2020

cbert_aug's Issues

why segment_ids can be change label_id?

I have viewed the paper of "Conditional Bert". But I have some questions about "alter segment ids to labels ids".
Because i use huggingface/transformer, i can not directly alter segment ids. Follow as:

huggingface/transformer token-type_ids
it just use 0 or 1 to represent sequence.
But today my label is multiple labels (ex. 1,2,3,.... , 10) and transfer this to [1,1,1,1,1,1,0,0,0,0,0], [2,2,2,2,2,2,0,0,0,0].

Then, just use token_type_ids thie argument. Follow as:

mlm_model = TFBertForMaskedLM.from_pretrained('bert-base-chinese')
mlm_model(inputs=inputs, attention_mask=attention_mask, token_type_ids=token_type_ids, labels=labels)

Get Follow Error:

InvalidArgumentError: indices[42,0] = 2 is not in [0, 2) [Op:ResourceGather] on type_token_ids

I want to check the method still can work?

chainer version

Hello,
When I run cbert_augdata.py, I met an error related to chainer.
Maybe my chianer version is too old.
I tried Chainer 7.7, but still I got an error related to version.
Can you tell me which version shall I choose?
Thank you ~

how to choice better temperature_value?

@1024er
I want to ask some question about cbert_augdata.py
How to choise better to the best temperature_value for text augmentation?
Hope can answer this problem. i will thanks a lot.

FileNotFoundError: [Errno 2] No such file or directory: 'aug_data_1_7_0_1.0/subj/train_origin.tsv'

Traceback (most recent call last):
File "cbert_augdata.py", line 187, in
main()
File "cbert_augdata.py", line 147, in main
shutil.copy(origin_train_path, save_train_path)
File "/data/Anaconda3/envs/lyc/lib/python3.6/shutil.py", line 245, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/data/Anaconda3/envs/lyc/lib/python3.6/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'aug_data_1_7_0_1.0/subj/train_origin.tsv'

More details about how SST-2 is prepared

The SST-2 dataset included in the repo contains 6,228 training samples, 692 validation samples, and 1821 test samples. But the official SST-2 dataset (which can be access via torchtext) contains 6,920 training binary-class samples, and 872 validation binary-class samples. What gives? Could you clarify the discrepancy?

Cbert for other language ?

Hello, Can cbert be
used for other language model ? For exemple , camembert or Flaubert for french language ?

关于论文实验的做法的问题

你好，我想咨询一下，在代码中，每一个epoch的训练，都会产生一个新的增强数据集，假设在第三个epoch中，增强的数据使得RNN+SST5的效果变好了，那在第四个epoch中会不会导致RNN+SST5的训练效果变差。您有做过实验证明，3epoch的效果好，那么4，5，6，epoch的效果一定好的实验吗？

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)` (createCublasHandle at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:8)

Hi ,when I run your code: python cbert_finetune.py

I got the following problem:

Traceback (most recent call last):
File "cbert_finetune.py", line 168, in
main()
File "cbert_finetune.py", line 151, in main
loss.backward()
File "/home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) (createCublasHandle at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:8)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f409fe5f536 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0xf67ee5 (0x7f40a1222ee5 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0x94c (0x7f40a1223ccc in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xf5d5e1 (0x7f40a12185e1 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x14079bd (0x7f40a16c29bd in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: THCudaTensor_addmm + 0x5c (0x7f40a16cc56c in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0x1053a08 (0x7f40a130ea08 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xf76dc8 (0x7f40a1231dc8 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #8: + 0x10c3ec0 (0x7f40dd807ec0 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #9: + 0x2c9b6fe (0x7f40df3df6fe in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #10: + 0x10c3ec0 (0x7f40dd807ec0 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #11: at::Tensor::mm(at::Tensor const&) const + 0xf0 (0x7f40dd3cbb70 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x28e6b6c (0x7f40df02ab6c in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::MmBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x151 (0x7f40df02b971 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: + 0x2d89c05 (0x7f40df4cdc05 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7f40df4caf03 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f40df4cbce2 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f40df4c4359 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f40ebc034d8 in /home1/wxzuo/anaconda3/envs/ccs/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #19: + 0xb8408 (0x7f40ecab7408 in /home1/wxzuo/anaconda3/lib/libstdc++.so.6)
frame #20: + 0x7e25 (0x7f41212c8e25 in /lib64/libpthread.so.0)
frame #21: clone + 0x6d (0x7f41206e1bad in /lib64/libc.so.6)

Here are my enviroments:
Package Version

certifi 2020.4.5.2
chardet 3.0.4
click 7.1.2
ConfigArgParse 1.2.3
cycler 0.10.0
Cython 3.0a5
dataclasses 0.7
decorator 4.1.2
dgl 0.4.3.post2
filelock 3.0.12
future 0.18.2
idna 2.9
joblib 0.15.1
kiwisolver 1.2.0
matplotlib 3.2.2
networkx 2.1
nltk 3.5
numpy 1.13.3
packaging 20.4
pandas 1.0.4
Pillow 7.1.2
pip 20.1.1
psutil 5.7.0
pycocotools 2.0
pyparsing 2.4.7
python-dateutil 2.8.1
pytz 2020.1
regex 2020.6.8
requests 2.23.0
sacremoses 0.0.43
scikit-learn 0.23.1
scipy 1.4.1
sentencepiece 0.1.91
setuptools 36.4.0
six 1.15.0
sklearn 0.0
stanfordcorenlp 3.9.1.1
threadpoolctl 2.1.0
tokenizers 0.7.0
torch 1.5.0
torchtext 0.6.0
torchvision 0.6.0
tqdm 4.46.1
transformers 2.11.0
urllib3 1.25.9
wheel 0.29.0

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Could please tell me how to solve this problem, thanks

global.config not being used everywhere

Line that copies which dataset (task) we use is commented:

cbert_aug/finetune_dataset.py

Line 274 in edf62db

#args.task_name = configs_dict.get("dataset")

It means that if you clone the repo and run:

python finetune_dataset.py
python aug_dataset.py

you will get an error, because one of the commands runs for rt-polarity and the other for subj dataset.

Besides this, maybe some of the other parameters should be copied from global.config?

forward() got an unexpected keyword argument 'masked_lm_labels'

forward() got an unexpected keyword argument 'masked_lm_labels',do you know why's that?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.