crux82 / ganbert-pytorch Goto Github PK

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

License: Apache License 2.0

Jupyter Notebook 100.00%

bert huggingface gan generative-adversarial-network semi-supervised-learning pytorch pythonbook text-classification

ganbert-pytorch's Introduction

GAN-BERT (in Pytorch and compatible with HuggingFace)

This is an implementation in Pytorch (and HuggingFace) of the GAN-BERT method from https://github.com/crux82/ganbert which is available in Tensorflow. While the original GAN-BERT was an extension of BERT, this implementation can be adapted to several architectures, ranging from Roberta to Albert!

IMPORTANT: Since this implementation is slightly different from the original Tensorflow one, some results may vary. Any feedback or suggestions for improving this first version would be appreciated.

GANBERT

This is the code for the paper "GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples" published in the ACL 2020 - short paper by Danilo Croce (Tor Vergata, University of Rome), Giuseppe Castellucci (Amazon) and Roberto Basili (Tor Vergata, University of Rome).

GAN-BERT is an extension of BERT which uses a Generative Adversarial setting to implement an effective semi-supervised learning schema. It allows training BERT with datasets composed of a limited amount of labeled examples and larger subsets of unlabeled material. GAN-BERT can be used in sequence classification tasks (also involving text pairs).

As in the original implementation in Tensorflow, this code runs the GAN-BERT experiment over the TREC dataset for the fine-grained Question Classification task. We provide in this package the code as well as the data for running an experiment by using 2% of the labeled material (109 examples) and 5343 unlabeled examples. The test set is composed of 500 annotated examples.

The Model

GAN-BERT is an extension of the BERT model within the Generative Adversarial Network (GAN) framework (Goodfellow et al, 2014). In particular, the Semi-Supervised GAN (Salimans et al, 2016) is used to make the BERT fine-tuning robust in such training scenarios where obtaining annotated material is problematic. When fine-tuned with very few labeled examples the BERT model is not able to provide sufficient performances. With GAN-BERT we extend the fine-tuning stage by introducing a Discriminator-Generator setting, where:

the Generator G is devoted to producing "fake" vector representations of sentences;
the Discriminator D is a BERT-based classifier over k+1 categories.

D has the role of classifying an example concerning the k categories of the task of interest, and it should recognize the examples that are generated by G (the k+1 category). G, instead, must produce representations as much similar as possible to the ones produced by the model for the "real" examples. G is penalized when D correctly classifies an example as fake.

In this context, the model is trained on both labeled and unlabeled examples. The labeled examples contribute to the computation of the loss function concerning the task k categories. The unlabeled examples contribute to the computation of the loss functions as they should not be incorrectly classified as belonging to the k+1 category (i.e., the fake category).

The resulting model is demonstrated to learn text classification tasks starting from very few labeled examples (50-60 examples) and to outperform the classical BERT fine-tuned models by a large margin in this setting.

More details are available at https://github.com/crux82/ganbert

Citation

If this software is usefull for your research, please cite the following paper:

@inproceedings{croce-etal-2020-gan,
    title = "{GAN}-{BERT}: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples",
    author = "Croce, Danilo  and
      Castellucci, Giuseppe  and
      Basili, Roberto",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.191",
    pages = "2114--2119"
}

Acknowledgments

We would like to thank Osman Mutlu and Ali Hürriyetoğlu for their implementation of GAN-BERT in Pytorch that inspired our porting. You can find their initial repository at this link. We would like to thank Claudia Breazzano (Tor Vergata, University of Rome) that supported this porting.

ganbert-pytorch's People

Contributors

Stargazers

Watchers

Forkers

malyang joywang233 antonmosin igoramli zmskye houangnt arnedefauw b-127 chandru4ni hoangthangta mehdi-mirzapour samarthmm akboddupalli whnhch aprmswra techthiyanes raakesh1305 rahmangithub

ganbert-pytorch's Issues

Support for validation set instead of evaluating on test set directly.

I see you only evaluate the test set in each epoch, can we add a validation set, with early stopping criteria based on the results/loss on this validation set?
this would also require a way to checkpoint the whole model in order to save the best model configuration against the dev set to be used against the test set at the end of training.

Please let me know if we can add that.
1- dev set support with early stopping criteria
2- checkpointing logic, to save and load the model.

One last question: Can you provide a way to train only the base model (BERT-based) without the GAN components, so that I take these numbers as a reference. So I can tell that the BERT-based model only got the following results against these results. And when we added GAN, we got these results.

Model errors when tried to change the dataset

I am trying to run the notebook against a different dataset (QADI) and changed the data format to the same format specified in the example dataset

I just changed the get_qc_examples function to directly read the line and class from the tab-separated file:

def get_qc_examples(input_file):
  """Creates examples for the training and dev sets."""
  examples = []

  with open(input_file, 'r') as f:
      contents = f.read()
      file_as_list = contents.splitlines()
      for line in file_as_list[1:]:
          # split = line.split(" ")
          # question = ' '.join(split[1:])

          # text_a = question
          # inn_split = split[0].split(":")
          # label = inn_split[0] + "_" + inn_split[1]

          split = line.split("\t")
          question = split[1]

          text_a = question
          label = split[0]
          examples.append((text_a, label))
      f.close()

  return examples

I also came with some unlabeled data, set them to the same file format with the label set as UNK_UNK, changed the labels list to:

label_list = ['UNK_UNK','Algeria', 'Bahrain', 'Egypt', 'Iraq', 'Jordan', 'Kuwait', 'Lebanon', 'Libya', 'Morocco', 'Oman', 'Palestine', 'Qatar', 'Saudi_Arabia', 'Sudan', 'Syria', 'Tunisia', 'United_Arab_Emirates', 'Yemen']

but I am getting an error at the end of the epoch (after training the whole epoch):

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-3e8566791cab> in <module>()
    111         # so the loss evaluated for unlabeled data is ignored (masked)
    112         label2one_hot = torch.nn.functional.one_hot(b_labels, len(label_list))
--> 113         per_example_loss = -torch.sum(label2one_hot * log_probs, dim=-1)
    114         per_example_loss = torch.masked_select(per_example_loss, b_label_mask.to(device))
    115         labeled_example_count = per_example_loss.type(torch.float32).numel()

RuntimeError: The size of tensor a (63) must match the size of tensor b (64) at non-singleton dimension 0

Am I doing something wrong? can you help me tackle this issue?

Save and deploy Model

Hi, is there a way to save and deploy the Ganbert model for production?

Accuracy problem

I am very happy to see such an excellent work. I am very interested in your article.
But I use your code, without any changes, the running results always change, and it is difficult to have an accuracy of 65.4%, why is this?
Thank you very much

Normal distibution

The paper says "G inputs consist of noise vectors drawn from a normal distribution N(0, 1)" but in this implementation seems to based on uniform random inputs. Perhaps they have similar results :)

Number of discriminator output

Hello,
Thanks for sharing your code. I am trying to understand the GAN-BERT paper. I faced an issue in the code that the reason is not clear to me.

In the python notebook, we have a dataset with 50 classes, in the "label_list" the name of each class is given, also "UNK_UNK" is added for unlabeled data. Therefore wherever len(label_list) is used, it is equal to 51.

On the other side, the discriminator's output is the number of classes +1, since discriminator is not only should discriminate between real and the fake examples, also need to classify real ones. Therefore, if an example is fake, it should be classified as class 51, otherwise, the discriminator assigns a class to the example.

Here is what I do not understand, when we are going to initialize the discriminator, we uselen(label_list) = 51, as the input argument for the number of classes. inside the discriminator also add +1 to the 51, so the discriminator output is 52.

Then, when the supervised learning loss is being calculated in the training phase, we take the logits = D_real_logits[:,0:-1] which means all the output logit except the last one (which is related to the example being fake or real and the last logit is used to calculate the unsupervised loss). In this code, logits = D_real_logits[:,0:-1] length is 51 while we have 50 classes. Also, in the evaluation time, when the test prediction and loss on the test set are calculated, filtered_logits takes all the output logit except the last one has the length of 51 while we have 50 labels. I wondered if there is any problem with the code or I did not fully understand the paper?

Thanks!

Deploy the model to use with live data

I have been trying to test and use the model. I am unable to create a function out of the model to try and generate a prediction for real/new/live data. I have made some tweaks to the model but haven't been able to really use it for testing.

Is the number of unlabeled examples related to the model accuracy?

Hi,

First, thank for your great contribution. From the paper, I know that unlabeled examples improve the inner representation or generalize the representation. But I wonder that the number of them will improve the model accuracy or not. In the experiment, you use 5343 unlabeled examples and they are relatively big compared to labeled examples and test examples. Have you tried any experiment with this?

Is it possible to obtain the text representation generated fake examples?

Thank you for this Pytorch implementation! I'm quite new with GANs and curious if it's possible to text representation of the generated examples. Thought this would be possible since GANs when applied in computer vision tasks can generate images (i.e human faces). Hoping to hear from you soon :)

Calculate F1

Is it possible to calculate the F1-score of the model?
As far as I can tell currently only accuracy and loss is reported for the completed model.

Electra support

Is there any way to use an electra bert model with this?

how do we use the trained model

Hi thanks for the great demo, i am confused how to use the trained model however..

Saving Best model during training

Hi, this is super useful thanks.
Is there a way to save the best model during the training? I tried to follow your AILC_Lectures_2021_Training_BERT_based_models_in_few_lines_of_code.ipynb, but it is not working here as GAN has generator, discriminator.

Unable to reproduce the results for 20-News dataset

Thanks for the excellent project! I'm new to GAN and was trying to reproduce the results(reported in the paper) for the 20News dataset. However, my testing accuracy stuck at about 5.2% no matter whether I used a 1% labelled or 10% labelled training dataset. (I tried 1%, 2%, 10%-50% but almost got same results.) Also, the generator training loss is extremely big like up to 1343123137304389. I used my own dataset with different ratios of labelled datasets and the highest accuracy I got is only 38%.

Just wondering does anyone was able to reproduce the results or perhaps knows what is going wrong?
I trained 20News dataset for 15 epochs, lr= 5e-6, dropout = 0.1, noise_size = 100, max_seq_length = 256, batch size = 64.
Appreciate your help!

Feature matching loss

It seems there is a tiny mismatch between the paper formula and the code on feature matching loss:

This is not exactly an L2-norm:

g_feat_reg = torch.mean(torch.pow(torch.mean(D_real_features, dim=0) - torch.mean(D_fake_features, dim=0), 2))

It should be like:

g_feat_reg = torch.sum(torch.pow(torch.mean(D_real_features, dim=0) - torch.mean(D_fake_features, dim=0), 2))

But I guess it doesn't change the outcome other than affecting the learning rates.

Train Accuracy in evaluation mode is decreased by every epoch

I did the same evaluation after every epoch for test set, but now is for train set. However the training accuracy is decreased while the train loss is increased, whap happened here?

`# ========================================
# TEST ON THE TRAINING DATASET
# ========================================
# After the completion of each training epoch, measure our performance on
# our training set.

    print("")
    print("Running Training...")

    t1 = time.time()
    
    # Tracking variables for train set
    total_train_loss = 0
    all_train_preds = []
    all_train_labels_ids = []
    nll_train_loss = torch.nn.CrossEntropyLoss(ignore_index=-1) #loss

    # Evaluate data for one epoch
    for batch in train_dataloader:
        
        # Unpack this training batch from our dataloader. 
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        
        # Tell pytorch not to bother with constructing the compute graph during
        # the forward pass, since this is only needed for backprop (training).
        with torch.no_grad():        
            model_outputs = transformer(b_input_ids, attention_mask=b_input_mask)
            hidden_states = model_outputs[-1]
            _, logits, probs = discriminator(hidden_states)
            ###log_probs = F.log_softmax(probs[:,1:], dim=-1)
            filtered_logits = logits[:,0:-1]
            # Accumulate the test loss.
            total_train_loss += nll_train_loss(filtered_logits, b_labels)
            
        # Accumulate the predictions and the input labels
        _, preds = torch.max(filtered_logits, 1)
        all_train_preds += preds.detach().cpu()
        all_train_labels_ids += b_labels.detach().cpu()

    # Report the final accuracy for this validation run.
    all_train_preds = torch.stack(all_train_preds).numpy()
    all_train_labels_ids = torch.stack(all_train_labels_ids).numpy()
    train_accuracy = np.sum(all_train_preds == all_train_labels_ids) / len(all_train_preds)
    print("  Train Accuracy: {0:.3f}".format(train_accuracy))

    # Calculate the average loss over all of the batches.
    avg_train_loss = total_train_loss / len(train_dataloader)
    avg_train_loss = avg_train_loss.item()
    
    # Measure how long the validation run took.
    train_time = format_time(time.time() - t1)
    
    print("  Train loss: {0:.3f}".format(avg_train_loss))
    print("  Train took: {:}".format(train_time))`

Gan BERT for Multilabel Intent Classification

Hi,
Thanks for the amazing research and the code implementation, is there any way to use this package for multi label intent classification ?

Generator doesn't work

Hi,
How are you?
Thanks for this great work. I have one question: I ran the pytorch code and the generator doesn't work. It means: only model learned and saved, I used generator to generate the random noise and discriminator classify label. However, it always classify to one single or very few labels. I tried so many ways: change noise to normal distribution, add some more layers , but it looks like generator doesn't learn ...

Can please give some guidance or workable code? thanks a lot!

Look forward to hearing from you soon!