The self-augmentation-strategy's discuss from alibaba

Downloaded model-type=deberta but generated model-type=sas

I generated a model using your pretraining script, but the config of the generated model differs significantly from the model downloaded from your github. The SAS_DA_base model has model-type deberta and model architecture "SADebertaForPretraining". I downloaded this model and used it as a checkpoint for the pretraining script. The model output for the pretraining script has model-type sas and model architecture "SasForPreTraining."

When I try to load the second into huggingface, it says that it has not match for model type = "sas". If I use the deberta model type, I get the warning "You are using a model of type sas to instantiate a model of type deberta. This is not supported for all configurations of models and can yield errors." followed by a list of weights not used (looks like all of them).

Why is the model generated with the pretraining script different from the model posted on the Github page?

How do you load a model of type "sas" with Huggingface?

SAS_DA_base config:
{
"architectures": [
"SADebertaForPretraining"
],
"attention_probs_dropout_prob": 0.1,
"embedding_size": 768,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-07,
"max_position_embeddings": 512,
"max_relative_positions": -1,
"model_type": "deberta",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"pooler_dropout": 0,
"pooler_hidden_act": "gelu",
"pooler_hidden_size": 768,
"pos_att_type": [
"c2p",
"p2c"
],
"position_biased_input": false,
"relative_attention": true,
"type_vocab_size": 0,
"vocab_size": 30522
}

Output of pretraining script Config:
{
"absolute_position_embedding": 1,
"architectures": [
"SasForPreTraining"
],
"attention_probs_dropout_prob": 0.1,
"augmentation_copies": 1,
"augmentation_temperature": 1,
"cold_start_epochs": 1.0,
"debug_config": {
"debugActivationInterval": 100000000,
"debugExtraMetrics": 1,
"debugGradOverflowInterval": 100,
"debugMemStatsInterval": 1000,
"debugMultiTasksConflictInterval": 1000,
"logging_steps": 200
},
"dis_weight": "50-50",
"dis_weight_scheduler": 4,
"dynamic_masking": 0,
"embedding_size": 768,
"gen_weight": 1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 128,
"model_type": "sas",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": [
"absolute"
],
"relative_position_embedding": 0,
"summary_activation": "gelu",
"summary_last_dropout": 0.1,
"summary_type": "first",
"summary_use_proj": true,
"transformers_version": "4.3.0",
"type_vocab_size": 2,
"vocab_size": 30522
}

alibaba / self-augmentation-strategy Goto Github PK

self-augmentation-strategy's Issues

Downloaded model-type=deberta but generated model-type=sas

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent