strongio / keras-bert Goto Github PK

View Code? Open in Web Editor NEW

258.0 258.0 108.0 34 KB

A simple technique to integrate BERT from tf hub to keras

Jupyter Notebook 76.43% Python 23.57%

keras-bert's People

Contributors

Stargazers

Watchers

Forkers

johndpope dantodor ag027592 legendtianjin dkyos ztx0728 shovonsg jkhlot hoangcuong2011 taymoorak timrajan vikchopde jas-ho woodongk fretana soonhwan-kwon kkkyan igeti nahidalam maximedb eduardofv kurtjanssensai gaoyi439 rsjain1978 newzq keithyau hassoudi zsoftwarerepository pjpan jendatx o20021106 shalini187 veritaem ushagayatri shahidash tr-ips-bahubali aaroncwhite rachel-sorek iknoorjobs b2220333 biranchi2018 urvashikhanna miguelperalvo drmegavolt sjs2109 mohan-mj ntoniocp kobkrit sayanbanerjee32 hsajjad mabu-dev technologycoder moojad urvishp80 geetdsa ghostintheshellarise ploych jill3240 maxliu mehrdad-naser-73 happy-face yuanziinlondon joaorobson hypnopump subhasree chayanbansal giancarlosotelo ancahu sharpant jogonba2 hasantanvir79 hy59 dmollaaliod suyash091 vedraiyani damianboh leiqi alexanderdoria neerajvashistha eakonor nitinh priyankadiddi caiocesarrm wuyingfeng1hao snowdj hannahyao deepaknlp tfdetang eyal8 jaingaurav3 13bmartens nadjet arvingoyal abiswas20 fneg drjlt sasharoberts deepaliverma kerneee makama-md

keras-bert's Issues

Check failed: NDIMS == dims() (2 vs. 4)Asking for tensor of 2 dimensions from a tensor of 4 dimensions

i just want to run this code , but when i change the code as another issue, remove the pooling layers from trainable weights but i still get the same error, how can i slove this problem?

Anyone managed to get this to work with saved model?

I've trained a model that retunes a bertlayer but I can't seem to get it to export as a saved model properly... Any ideas?

Is output shape incorrect?

Thank you very much for helping us know how to do transfer learning with Bert by using Keras!
I have a small question about the shape of output tensor from Bert layer.
I see the following compute_output_shape function in BertLayer class:

def compute_output_shape(self, input_shape):
    return (input_shape[0], self.output_size)

It seems the function is attempting to indicate that the output has shape of (batch_size, output_size), but it is not true IMHO.
The input is a list which contains 3 tensors of shape (batch_size, max_sequence_length), which means input_shape[0] has value of (batch_size, max_sequence_length) and eventually causes the output shape ends up with ((batch_size, max_sequence_length), output_size), not the expected value (batch_size, output_size).

Please correct me if I am wrong. Thanks!

INVALID: Not observing the same accuracy reported when reproducing

Hi. I was trying to repro your results before implementing for my research to make sure that I didn't miss anything. But, when the model was built, the number of trainable and non-trainable params were not the same but the total params was tallied. Also, the accuracy over one epoch was reported as around 85.56% in your notebook but I observed only around 50.10%. The predictions with training and with loading the saved model printed True for same predictions anyways.

Could you please help why I am not able to repro your results?

Please find the setup details below:
Ubuntu 18.04 running docker image of tensorflow-1.14.0 with NVIDIA GPU.

Trainable Parameters Different

I ran the code from 'keras-bert.ipynb' as it is and observed that the number of trainable parameters in my run is '22,051,329' instead of '3,147,009' in your run of the notebook. Also my accuracy is just about 0.53. Can you please help me out. Thanks!

Error accessing variables attribute of bert module

When I try to build the model, I get an error saying - 'Module' object has no attribute 'variables'

This occurs specifically in the build function of the BertLayer class when I try to access self.bert.variables.

I tried a dir(self.bert) to get all the attributes of the object and it indeed did not have an attribute called variables. These are the attributes I obtained:

['_call_', '_class_', '_delattr_', '_dict_', '_dir_', '_doc_', '_eq_', '_format_', '_ge_', '_getattribute_', '_gt_', '_hash_', '_init_', '_init_subclass_', '_le_', '_lt_', '_module_', '_ne_', '_new_', '_reduce_', '_reduce_ex_', '_repr_', '_setattr_', '_sizeof_', '_str_', '_subclasshook_', '_weakref_', '_graph', '_impl', '_name', '_spec', '_tags', '_trainable', 'export', 'get_input_info_dict', 'get_output_info_dict', 'get_signature_names', 'variable_map']

I'm using tf version 1.13.0 with Python3.5 on Win10.

No pooling embeddings

Hi, I wonder if it is possible to modify the code, to get embeddings at word level instead of at the sentence level.

Nested tensor dimensions

I would like to use BERT for a multi-class multi-task classification. For each sentence (let's say with a fixed number of n tokens) to classify, BERT would (when I got it right) provide a vector of 768 elements, i.e., (n,768). When batches are involved, I would expect to have (None, n, 768). With keras-bert, I obtain ((None, n), 768). For feeding this tensor to keras' text YoonKimCNN, I have to add a further dimension here, but the nested structure remains, so that also the final layer have this ((None, n), m), even though I would expect to obtain (None,m) in the end. Structure:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_ids (InputLayer)          (None, 256)          0                                            
__________________________________________________________________________________________________
input_masks (InputLayer)        (None, 256)          0                                            
__________________________________________________________________________________________________
segment_ids (InputLayer)        (None, 256)          0                                            
__________________________________________________________________________________________________
bert_layer_1 (BertLayer)        ((None, 256), 768)   110104890   input_ids[0][0]                  
                                                                 input_masks[0][0]                
                                                                 segment_ids[0][0]                
__________________________________________________________________________________________________
reshape_1 (Reshape)             ((None, 256), 768, 1 0           bert_layer_1[0][0]               
__________________________________________________________________________________________________
consume_mask_1 (ConsumeMask)    ((None, 256), 768, 1 0           reshape_1[0][0]                  
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               ((None, 256), 766, 1 512         consume_mask_1[0][0]             
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               ((None, 256), 765, 1 640         consume_mask_1[0][0]             
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               ((None, 256), 764, 1 768         consume_mask_1[0][0]             
__________________________________________________________________________________________________
global_max_pooling1d_1 (GlobalM ((None, 256), 128)   0           conv1d_1[0][0]                   
__________________________________________________________________________________________________
global_max_pooling1d_2 (GlobalM ((None, 256), 128)   0           conv1d_2[0][0]                   
__________________________________________________________________________________________________
global_max_pooling1d_3 (GlobalM ((None, 256), 128)   0           conv1d_3[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     ((None, 256), 384)   0           global_max_pooling1d_1[0][0]     
                                                                 global_max_pooling1d_2[0][0]     
                                                                 global_max_pooling1d_3[0][0]     
__________________________________________________________________________________________________
dropout_1 (Dropout)             ((None, 256), 384)   0           concatenate_1[0][0]              
__________________________________________________________________________________________________
dense_4 (Dense)                 ((None, 256), 256)   98560       dropout_1[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 ((None, 256), 128)   49280       dropout_1[0][0]                  
__________________________________________________________________________________________________
dropout_4 (Dropout)             ((None, 256), 256)   0           dense_4[0][0]                    
__________________________________________________________________________________________________
dropout_2 (Dropout)             ((None, 256), 128)   0           dense_1[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 ((None, 256), 128)   32896       dropout_4[0][0]                  
__________________________________________________________________________________________________
dense_2 (Dense)                 ((None, 256), 64)    8256        dropout_2[0][0]                  
__________________________________________________________________________________________________
dropout_5 (Dropout)             ((None, 256), 128)   0           dense_5[0][0]                    
__________________________________________________________________________________________________
dropout_3 (Dropout)             ((None, 256), 64)    0           dense_2[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 ((None, 256), 25)    3225        dropout_5[0][0]                  
__________________________________________________________________________________________________
dense_3 (Dense)                 ((None, 256), 1)     65          dropout_3[0][0]                  
==================================================================================================
Total params: 110,299,092
Trainable params: 71,072,922
Non-trainable params: 39,226,170
__________________________________________________________________________________________________

This looks different from what we can see here. Any suggestions how to get rid of the nested structure are welcome.

NotImplementedError: Layers with arguments in `init` must override `get_config`.

Hello, I am trying to save a model built exactly from your code example. However, I get the below error. Any advice?

`---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
in ()
----> 1 tf.keras.models.save_model( model, 'model', overwrite=True, include_optimizer=True )
2

5 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py in save_model(model, filepath, overwrite, include_optimizer, save_format, signatures)
107 'or using save_weights.')
108 hdf5_format.save_model_to_hdf5(
--> 109 model, filepath, overwrite, include_optimizer)
110 else:
111 saved_model_save.save(model, filepath, overwrite, include_optimizer,

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py in save_model_to_hdf5(model, filepath, overwrite, include_optimizer)
91
92 try:
---> 93 model_metadata = saving_utils.model_metadata(model, include_optimizer)
94 for k, v in model_metadata.items():
95 if isinstance(v, (dict, list, tuple)):

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/saving_utils.py in model_metadata(model, include_optimizer, require_config)
158 except NotImplementedError as e:
159 if require_config:
--> 160 raise e
161
162 metadata = dict(

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/saving_utils.py in model_metadata(model, include_optimizer, require_config)
155 model_config = {'class_name': model.class.name}
156 try:
--> 157 model_config['config'] = model.get_config()
158 except NotImplementedError as e:
159 if require_config:

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py in get_config(self)
884 for layer in self.layers: # From the earliest layers on.
885 layer_class_name = layer.class.name
--> 886 layer_config = layer.get_config()
887
888 filtered_inbound_nodes = []

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in get_config(self)
578 # or that get_config has been overridden:
579 if len(extra_args) > 1 and hasattr(self.get_config, '_is_default'):
--> 580 raise NotImplementedError('Layers with arguments in __init__ must '
581 'override get_config.')
582 # TODO(reedwm): Handle serializing self._dtype_policy.

NotImplementedError: Layers with arguments in __init__ must override get_config.`

AttributeError: 'Module' object has no attribute 'variables'

Hi, I am using tensorflow-gpu and trying to run the code but getting this error :

Check failed

hello,
I modified a part of the code，

bert_outputs = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)['sequence_output']
return (max_seq_length, self.output_size)

but when i run the code, i have the following problem!
"F tensorflow/core/framework/tensor_shape.cc:44] Check failed: NDIMS == dims() (2 vs. 4)Asking for tensor of 2 dimensions from a tensor of 4 dimensions
Aborted (core dumped)"

may I know what is the reason?

Bert Embeddings Layer for Word Sense Disambiguation task

I am trying to use your keras embeddings layer wrapper to use it for WSD, however I have this error every time

Traceback

Traceback (most recent call last):
  File "D:/SVC/GitLab/ahmed_elsheikh_1873337_nlp19project/code/model_bert_prova.py", line 234, in <module>
    model = baseline_model(output_size, visualize=True)
  File "D:/SVC/GitLab/ahmed_elsheikh_1873337_nlp19project/code/model_bert_prova.py", line 61, in baseline_model
    )(bert_embedding)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\layers\wrappers.py", line 473, in __call__
    return super(Bidirectional, self).__call__(inputs, **kwargs)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 746, in __call__
    self.build(input_shapes)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\layers\wrappers.py", line 612, in build
    self.forward_layer.build(input_shape)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\utils\tf_utils.py", line 149, in wrapper
    output_shape = fn(instance, input_shape)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 552, in build
    self.cell.build(step_input_shape)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\utils\tf_utils.py", line 149, in wrapper
    output_shape = fn(instance, input_shape)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 1934, in build
    constraint=self.kernel_constraint)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 609, in add_weight
    aggregation=aggregation)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\training\checkpointable\base.py", line 639, in _add_variable_with_custom_getter
    **kwargs_for_getter)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1977, in make_variable
    aggregation=aggregation)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variables.py", line 183, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variables.py", line 146, in _variable_v1_call
    aggregation=aggregation)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variables.py", line 125, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2437, in default_variable_creator
    import_scope=import_scope)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variables.py", line 187, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 297, in __init__
    constraint=constraint)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 409, in _init_from_args
    initial_value() if init_from_fn else initial_value,
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1959, in <lambda>
    shape, dtype=dtype, partition_info=partition_info)
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\init_ops.py", line 473, in __call__
    scale /= max(1., (fan_in + fan_out) / 2.)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x000001D506B45358>>
Traceback (most recent call last):
  File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\client\session.py", line 738, in __del__
TypeError: 'NoneType' object is not callable

The Wrapper Layer by you guys

import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Layer


class BertEmbeddingLayer(Layer):
    '''
    Integrate BERT Embeddings from tensorflow hub into a
    custom Keras layer.
    references:
        1. https://github.com/strongio/keras-bert
        2. https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
    '''

    def __init__(self, n_fine_tune_layers=10, pooling="mean",
                 bert_path="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1",
                 **kwargs,):
        self.n_fine_tune_layers = n_fine_tune_layers
        self.trainable = True
        self.output_size = 768
        self.pooling = pooling
        self.bert_path = bert_path
        if self.pooling not in ["first", "mean"]:
            raise NameError(
                f"Undefined pooling type (must be either first or mean, but is {self.pooling}")

        super(BertEmbeddingLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.bert = hub.Module(self.bert_path,
                               trainable=self.trainable,
                               name=f"{self.name}_module")

        # Remove unused layers
        trainable_vars = self.bert.variables
        if self.pooling == "first":
            trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]
            trainable_layers = ["pooler/dense"]

        elif self.pooling == "mean":
            trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name and not "/pooler/" in var.name]
            trainable_layers = []
        else:
            raise NameError(f"Undefined pooling type (must be either first or mean, but is {self.pooling}")

        # Select how many layers to fine tune
        for i in range(self.n_fine_tune_layers):
            trainable_layers.append(f"encoder/layer_{str(11 - i)}")

        # Update trainable vars to contain only the specified layers
        trainable_vars = [
            var
            for var in trainable_vars
            if any([l in var.name for l in trainable_layers])
        ]

        # Add to trainable weights
        for var in trainable_vars:
            self._trainable_weights.append(var)

        for var in self.bert.variables:
            if var not in self._trainable_weights:
                self._non_trainable_weights.append(var)

        super(BertEmbeddingLayer, self).build(input_shape)

    def call(self, inputs):
        inputs = [K.cast(x, dtype="int32") for x in inputs]
        input_ids, input_mask, segment_ids = inputs
        bert_inputs = dict(input_ids=input_ids,
                           input_mask=input_mask,
                           segment_ids=segment_ids
                           )
        if self.pooling == "first":
            pooled = self.bert(inputs=bert_inputs,
                               signature="tokens",
                               as_dict=True)["pooled_output"]
        elif self.pooling == "mean":
            result = self.bert(inputs=bert_inputs,
                               signature="tokens",
                               as_dict=True)["sequence_output"]

            def mul_mask(x, m):
                return x * tf.expand_dims(m, axis=-1)

            def masked_reduce_mean(x, m):
                return tf.reduce_sum(mul_mask(x, m), axis=1) / (tf.reduce_sum(m, axis=1, keepdims=True) + 1e-10)
            input_mask = tf.cast(input_mask, tf.float32)
            pooled = masked_reduce_mean(result, input_mask)
        else:
            raise NameError(f"Undefined pooling type (must be either first or mean, but is {self.pooling}")

        return pooled

    def compute_output_shape(self, input_shape):
        return input_shape[0][0], input_shape[0][1], self.output_size

My BiLSTM Model

import os
import yaml
import numpy as np
from argparse import ArgumentParser

import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras.layers import (LSTM, Add, Bidirectional, Dense, Input, TimeDistributed, Embedding)

from tensorflow.keras.preprocessing.sequence import pad_sequences

try:
    from bert.tokenization import FullTokenizer
except ModuleNotFoundError:
    os.system('pip install bert-tensorflow')

from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
from tqdm import tqdm

from keras_bert import BertEmbeddingLayer
from model_utils import visualize_plot_mdl
from parsing_dataset import load_dataset
from utilities import configure_tf, initialize_logger


def parse_args():
    parser = ArgumentParser(description="WSD")
    parser.add_argument("--model_type", default='baseline', type=str,
                        help="""Choose the model: baseline: BiLSTM Model.
                                attention: Attention Stacked BiLSTM Model.
                                seq2seq: Seq2Seq Attention.""")

    return vars(parser.parse_args())


def train_model(mdl, data, epochs=1, batch_size=32):
    [train_input_ids, train_input_masks, train_segment_ids], train_labels = data
    history = mdl.fit([train_input_ids, train_input_masks, train_segment_ids],
                      train_labels, epochs=epochs, batch_size=batch_size)
    return history


def baseline_model(output_size):
    hidden_size = 128
    max_seq_len = 64

    in_id = Input(shape=(max_seq_len,), name="input_ids")
    in_mask = Input(shape=(max_seq_len,), name="input_masks")
    in_segment = Input(shape=(max_seq_len,), name="segment_ids")
    bert_inputs = [in_id, in_mask, in_segment]

    bert_embedding = BertEmbeddingLayer()(bert_inputs)
    embedding_size = 768

    bilstm = Bidirectional(LSTM(hidden_size,
                                return_sequences=True,
                                input_shape=(None, None, embedding_size)
                                ),
                           merge_mode='sum'
                           )(bert_embedding)

    output = TimeDistributed(Dense(output_size, activation="softmax"))(bilstm)

    mdl = Model(inputs=bert_inputs, outputs=output, name="Bert_BiLSTM")

    mdl.compile(loss="sparse_categorical_crossentropy",
                optimizer='adadelta', metrics=["acc"])

    return mdl


def initialize_vars(sess):
    sess.run(tf.local_variables_initializer())
    sess.run(tf.global_variables_initializer())
    sess.run(tf.tables_initializer())
    K.set_session(sess)


class PaddingInputExample(object):
    """Fake example so the num input examples is a multiple of the batch size.
  When running eval/predict on the TPU, we need to pad the number of examples
  to be a multiple of the batch size, because the TPU requires a fixed batch
  size. The alternative is to drop the last batch, which is bad because it means
  the entire output data won't be generated.
  We use this class instead of `None` because treating `None` as padding
  batches could cause silent errors.
  """

class InputExample(object):
    """A single training/test example for simple sequence classification."""

    def __init__(self, guid, text_a, text_b=None, label=None):
        """Constructs a InputExample.
    Args:
      guid: Unique id for the example.
      text_a: string. The un-tokenized text of the first sequence. For single
        sequence tasks, only this sequence must be specified.
      text_b: (Optional) string. The un-tokenized text of the second sequence.
        Only must be specified for sequence pair tasks.
      label: (Optional) string. The label of the example. This should be
        specified for train and dev examples, but not for test examples.
    """
        self.guid = guid
        self.text_a = text_a
        self.text_b = text_b
        self.label = label


def create_tokenizer_from_hub_module(bert_path="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"):
    """Get the vocab file and casing info from the Hub module."""
    bert_module = hub.Module(bert_path)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    vocab_file, do_lower_case = sess.run(
        [
            tokenization_info["vocab_file"],
            tokenization_info["do_lower_case"],
        ]
    )

    return FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)


def convert_single_example(tokenizer, example, max_seq_length=256):
    """Converts a single `InputExample` into a single `InputFeatures`."""

    if isinstance(example, PaddingInputExample):
        input_ids = [0] * max_seq_length
        input_mask = [0] * max_seq_length
        segment_ids = [0] * max_seq_length
        label = [0] * max_seq_length
        return input_ids, input_mask, segment_ids, label

    tokens_a = tokenizer.tokenize(example.text_a)
    if len(tokens_a) > max_seq_length - 2:
        tokens_a = tokens_a[0: (max_seq_length - 2)]

    tokens = []
    segment_ids = []
    tokens.append("[CLS]")
    segment_ids.append(0)
    example.label.append(0)
    for token in tokens_a:
        tokens.append(token)
        segment_ids.append(0)
    tokens.append("[SEP]")
    segment_ids.append(0)
    example.label.append(0)

    input_ids = tokenizer.convert_tokens_to_ids(tokens)

    # The mask has 1 for real tokens and 0 for padding tokens. Only real
    # tokens are attended to.
    input_mask = [1] * len(input_ids)

    # Zero-pad up to the sequence length.
    while len(input_ids) < max_seq_length:
        input_ids.append(0)
        input_mask.append(0)
        segment_ids.append(0)
        example.label.append(0)

    assert len(input_ids) == max_seq_length
    assert len(input_mask) == max_seq_length
    assert len(segment_ids) == max_seq_length

    return input_ids, input_mask, segment_ids, example.label


def convert_examples_to_features(tokenizer, examples, max_seq_length=256):
    """Convert a set of `InputExample`s to a list of `InputFeatures`."""

    input_ids, input_masks, segment_ids, labels = [], [], [], []
    for example in tqdm(examples, desc="Converting examples to features"):
        input_id, input_mask, segment_id, label = convert_single_example(tokenizer, example, max_seq_length)
        input_ids.append(np.array(input_id))
        input_masks.append(np.array(input_mask))
        segment_ids.append(np.array(segment_id))
        labels.append(np.array(label))
    return np.array(input_ids), np.array(input_masks), np.array(segment_ids), np.array(labels).reshape(-1, 1)


def convert_text_to_examples(texts, labels):
    """Create InputExamples"""
    InputExamples = []
    for text, label in zip(texts, labels):
        InputExamples.append(
            InputExample(guid=None, text_a=" ".join(text), text_b=None, label=label)
        )
    return InputExamples


# Initialize session
sess = tf.Session()

params = parse_args()
initialize_logger()
configure_tf()

# Load our config file
config_file_path = os.path.join(os.getcwd(), "config.yaml")
config_file = open(config_file_path)
config_params = yaml.load(config_file)

# This parameter allow that train_x to be in form of words, to allow using of your keras-elmo layer
elmo = config_params["use_elmo"]  
dataset = load_dataset(elmo=elmo)
vocabulary_size = dataset.get("vocabulary_size")
output_size = dataset.get("output_size")

# Parse data in Bert format
max_seq_length = 64
train_x = dataset.get("train_x")
train_text = [' '.join(x) for x in train_x]
train_text = [' '.join(t.split()[0:max_seq_length]) for t in train_text]
train_text = np.array(train_text, dtype=object)[:, np.newaxis]
# print(train_text.shape)  # (37184, 1)
train_labels = dataset.get("train_y")

# Instantiate tokenizer
tokenizer = create_tokenizer_from_hub_module()

# Convert data to InputExample format
train_examples = convert_text_to_examples(train_text, train_labels)

# Extract features
(train_input_ids, train_input_masks, train_segment_ids, train_labels) = convert_examples_to_features(tokenizer, train_examples, max_seq_length=max_seq_length)

bert_inputs = [train_input_ids, train_input_masks, train_segment_ids]
data = bert_inputs, train_labels
del dataset

model = baseline_model(output_size)

# Instantiate variables
initialize_vars(sess)

history = train_model(model, data)

Can you please let me know how to solve it?

Text classification (keras_text): 'Node' object has no attribute 'output_masks'

Hi,

i would like to use the output of a BertLayer as Input for a YookKimCNN. This is implemented in keras_text. I already realized that mixing tf and keras imports is not a good idea. However, the YoonKimCNN is currently "only" available in keras_text, leading to the following error:

AttributeError: 'Node' object has no attribute 'output_masks'

After several changes, I currently use tensorflow 1.12.0 and keras 2.2.4.

Thanks for any suggestions in advance

failed prediction error.

I am running the script on my machine with the following configuration
TF = 1.14
OS = Windows 10
Python = 3.7

Here is the full error

  File "keras-bert.py", line 354, in <module>
    main()
  File "keras-bert.py", line 339, in main
    batch_size=32,
  File "C:\Users\Urvish\Envs\tf_1_14\lib\site-packages\tensorflow\python\keras\engine\training.py", line 780, in fit
    steps_name='steps_per_epoch')
  File "C:\Users\Urvish\Envs\tf_1_14\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 363, in model_iteration
    batch_outs = f(ins_batch)
  File "C:\Users\Urvish\Envs\tf_1_14\lib\site-packages\tensorflow\python\keras\backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "C:\Users\Urvish\Envs\tf_1_14\lib\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
  (0) Failed precondition: Error while reading resource variable bert_layer_module/bert/encoder/layer_9/output/LayerNorm/gamma from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/bert_layer_module/bert/encoder/layer_9/output/LayerNorm/gamma/class tensorflow::Var does not exist.
         [[{{node bert_layer/bert_layer_module_apply_tokens/bert/encoder/layer_9/output/LayerNorm/batchnorm/mul/ReadVariableOp}}]]
         [[loss/mul/_489]]
  (1) Failed precondition: Error while reading resource variable bert_layer_module/bert/encoder/layer_9/output/LayerNorm/gamma from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/bert_layer_module/bert/encoder/layer_9/output/LayerNorm/gamma/class tensorflow::Var does not exist.
         [[{{node bert_layer/bert_layer_module_apply_tokens/bert/encoder/layer_9/output/LayerNorm/batchnorm/mul/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.

Error while build_model(max_seq_length)

TypeError Traceback (most recent call last)
in ()
----> 1 model = build_model(max_seq_length)
2
3 # Instantiate variables
4 initialize_vars(sess)
5

3 frames
in build_model(max_seq_length)
10
11 #model=tf.keras.layers(inputs=bert_inputs, outputs=pred)
---> 12 model = tf.keras.models.Model(inputs=bert_inputs, outputs=pred)
13
14 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in init(self, *args, **kwargs)
127
128 def init(self, *args, **kwargs):
--> 129 super(Model, self).init(*args, **kwargs)
130 # initializing _distribution_strategy here since it is possible to call
131 # predict on a model without compiling it.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py in init(self, *args, **kwargs)
165 self._init_subclassed_network(**kwargs)
166
--> 167 tf_utils.assert_no_legacy_layers(self.layers)
168
169 # Several Network methods have "no_automatic_dependency_tracking"

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/tf_utils.py in assert_no_legacy_layers(layers)
397 'classes), please use the tf.keras.layers implementation instead. '
398 '(Or, if writing custom layers, subclass from tf.keras.layers rather '
--> 399 'than tf.layers)'.format(layer_str))
400
401

TypeError: The following are legacy tf.layers.Layers:
<main.BertLayer object at 0x7fa4e1b239b0>
To use keras as a framework (for instance using the Network, Model, or Sequential classes), please use the tf.keras.layers implementation instead. (Or, if writing custom layers, subclass from tf.keras.layers rather than tf.layers)

How to visualize attention in Bert ?

Can you please suggest a way I can visualize the attention mechanism of Bert while using your code ( I mean while using Tensorflow Hub for the weights ) ?

NotImplementedError: Layers with arguments in `init` must override `get_config`.

Greetings. When i try to run last cell, which is model save and load. i get this error.
NotImplementedError: Layers with arguments in __init__ must override get_config.

Error when calling model = build_model(max_seq_length)

I am getting the following error
RuntimeError: variable_scope module_1/ was unused but the corresponding name_scope was already taken.
Could you please help?

AttributeError: module 'tensorflow_hub.tf_v1' has no attribute 'estimator'

I have installed tensorflow-hub using pip install tensorflow-hub
I am using tensorflow==1.13.1 and python==3.7.4 (64)
Could anyone help me with this issue?

'AutoTrackable' object is not callable

when i try to implement the same code in google colab it was throwing an error while alling the create_tokenizer_from_hub_module() function .

How to re-use the keras-bert implementation for question answering task?

Has anyone tried replicating this in keras with SQUAD dataset? It is not clear how can we prepare the input data for a custom bert model like having a LSTM on top of bert-base.

Multiclass classification

Hello ! Thanks for the notebook, it is really helpful! I am trying to make it work for multiclass classification but I have some difficulties. My dataset its strings with multiple labels, which I one-hot encode before I train/test split them and feed them into the class 'Inputexample'. It seems to work after that, but when I try to call the model later on it gives me the following error.

"Input arrays should have the same number of samples as target arrays. Found 10251 input samples and 51255 target samples."

I suspect it has something to do with how it converts y to features since 10251 x 5 = 51255 and I have 5 classes. Is there something inherent to binary classification in your code that would raise this error?

Unable to run initialise training

I am having the following issue...

The model compiles and prints the following output. However, on model.fit() nothing happens, despite verbose mode being turned on.

When i look at my hardware utilisation, my GPU has memory allocated to the process however utilisation is 0-2%. On my CPU, only one core is getting worked by the process at 100% utilisation.

To test my tensorflow-gpu install, I ran the CNN example on tensorflow and got 20% GPU utilisation.

I don't think it is a preprocessing bottleneck as I load my training data into memory.

Thanks.

Code:
` bert_path = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
max_seq_length = 256

corpus = MyDocs("datasets/bbc/raw", bert_path, max_seq_length)

ids = []
masks = []
segment_ids = []
for id, mask, segment, label in corpus:
    ids.append(id)
    masks.append(masks)
    segment_ids.append(segment)
X = [ids, masks, segment_ids]

labels = corpus.labels
label_encoder = OneHotEncoder()
y = label_encoder.fit_transform(np.array(labels).reshape(-1, 1)).todense()

print('Building model...')
model = build_model(bert_path, max_seq_length)

print('Training model...')
history = model.fit(X, y,
                    validation_split=0.2,
                    epochs=1,
                    batch_size=1,
                    verbose=2,
                    use_multiprocessing=True)`

Output:

Building model...
W0709 21:57:53.871020 140194145126208 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0709 21:57:53.922768 140194145126208 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Training model...

" ".join(text) erroneously splits everything into characters

In keras-bert.ipynb, I see the following:


def convert_text_to_examples(texts, labels):
    """Create InputExamples"""
    InputExamples = []
    for text, label in zip(texts, labels):
        InputExamples.append(
            InputExample(guid=None, text_a=" ".join(text), text_b=None, label=label)
        )
    return InputExamples

It is believed that " ".join(text) actually splits the words into characters. This in turn causes BERT to tokenize based on character as opposed to the whole or partial word.

Variable split error in BERT LAYER

ValueError Traceback (most recent call last)
in ()
1 #Training the model
----> 2 model = build_model(max_seq_length)
3
4 # Instantiate variables
5 initialize_vars(sess)

in build_model(max_seq_length)
7 bert_inputs = [in_id, in_mask, in_segment]
8
----> 9 bert_output = BertLayer(n_fine_tune_layers=3)(bert_inputs)

~\Anaconda\lib\site-packages\tensorflow\python\layers\base.py in call(self, inputs, *args, **kwargs)
372
373 # Actually call layer
--> 374 outputs = super(Layer, self).call(inputs, *args, **kwargs)
375
376 if not context.executing_eagerly():

~\Anaconda\lib\site-packages\tensorflow\python\keras\engine\base_layer.py in call(self, inputs, *args, **kwargs)
744 # the user has manually overwritten the build method do we need to
745 # build it.
--> 746 self.build(input_shapes)
747 # We must set self.built since user defined build functions are not
748 # constrained to set self.built.

in build(self, input_shape)
16 for var in self.bert.variables:
17 if "encoder" in var.name:
---> 18 layer_no = int(var.name.split("/")[3])
19 layer_no = inti(layer_no.split("_")[-1])
20 if layer_no >= 12 - self.n_fine_tune_layers:

ValueError: invalid literal for int() with base 10: 'encoder'

FailedPreconditionError

Has anyone encountered this error?

Traceback (most recent call last):
File "keras-bert.py", line 336, in
main()
File "keras-bert.py", line 333, in main
model.fit([train_input_ids, train_input_masks, train_segment_ids],train_labels,validation_data=([test_input_ids, test_input_masks, test_segment_ids],test_labels,),epochs=1,batch_size=32,)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py", line 643, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 664, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 383, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/backend.py", line 3353, in call
run_metadata=self.run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable bert_layer_module/bert/encoder/layer_9/attention/self/query/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/bert_layer_module/bert/encoder/layer_9/attention/self/query/bias/N10tensorflow3VarE does not exist.
[[{{node bert_layer/bert_layer_module_apply_tokens/bert/encoder/layer_9/attention/self/query/BiasAdd/ReadVariableOp}}]]

The code is basically the same, just some minor changes in the process to overcome other errors.

Wrong order of values when calling bert.variables and fine tune after that

Thank you very much for the article. After that, I wanted to understand BERT more deeply and found the following thing in your code.
For fine tune, you use the following line of code:
trainable_vars = self.bert.variables
trainable_vars = trainable_vars [-self.n_fine_tune_layers:]
However, self.bert.variables returns the list sorted by variable names, and therefore the 11th block of the transformer goes before 9. And with fine tune, intermediate layers are trained when the others are completely frozen.

bert.variables return

 <tf.Variable 'BERT_module_1/bert/embeddings/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/position_embeddings:0' shape=(512, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/word_embeddings:0' shape=(119547, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/pooler/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/pooler/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/output_bias:0' shape=(119547,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/dense/kernel:0' shape=(768, 768) dtype=float32>]```

ModuleNotFoundError: No module named 'bert'

Where is the bert folder, because it is not able to access it and I checked all your repos but could not find the code for actual bert. I would extremely appreciate it if you could send me a replt about this issue.

Thanks

The error is : ModuleNotFoundError: No module named 'bert'

Project doesn't have have requirements.txt

It isn't indeed issue, but it would be nice to have requirements.txt here

Unable to run the code in Google Colaboratory

I got this error when I run the code in google colaboratory

TypeError: The following are legacy tf.layers.Layers:
<main.BertLayer object at 0x7fa94d2f5048>
To use keras as a framework (for instance using the Network, Model, or Sequential classes), please use the tf.keras.layers implementation instead. (Or, if writing custom layers, subclass from tf.keras.layers rather than tf.layers)

Use the "[CLS]" instead of pooled result?

I don't know if I am get this correct. According to the BERT paper, author mentioned to use the first vector to do a classification ("[CLS]"). I saw you are using "pooled" vector in your code. Is there any reason?

Thanks,
Li Sun