Code Monkey home page Code Monkey logo

natural-question-answering's People

Contributors

dependabot[bot] avatar see-- avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

natural-question-answering's Issues

Error while running demo.py

When I am trying to run the demo.py file on Google Colab, I am getting the following error with the tokenizer.

ValueError: Non-consecutive added token 'td_colspan' found. Should have index 30522 but has index 1 in saved vocabulary.

Please help me resolve this error.

TypeError: add_tokens() got an unexpected keyword argument 'offset'

Hi everyone,
I try to run this repo, but I met error below:

Traceback (most recent call last):  
  File "train_eval.py", line 481, in <module>
    main()
  File "train_eval.py", line 437, in main
    num_added = tokenizer.add_tokens(add_tokens, offset=offset)
TypeError: add_tokens() got an unexpected keyword argument 'offset' 

My environment:

  • Python: 3.7.0
  • Transformers: 2.2.0
  • Tensorflow: 2.0

I checked all of version of the transformer, but I haven't found version that has offset argument in the add_tokens method

Memory leak in model loaded from tf-hub

First of all thank you for the great work. Your Q&A model rocks. Really interesting to see what is possible for next level Q&A.

I played around with the model in tf-hub and noticed it has a memory leak.
Here is my code:

import tensorflow as tf
import tensorflow_hub as hub
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("tokenizer_tf2_qa")
model = hub.load("https://tfhub.dev/see--/bert-uncased-tf2-qa/1")

for question, context in data:
    
    # create input vector representation
    encoded = tokenizer.encode_plus(question, context, add_special_tokens=True)
    input_word_ids = encoded["input_ids"]
    input_mask = encoded["attention_mask"]
    input_type_ids = encoded["token_type_ids"]

    # convert to tf.int32 and pass through model
    input_word_ids, input_mask, input_type_ids = map(
        lambda t: tf.expand_dims(tf.convert_to_tensor(t, dtype=tf.int32), 0),
        (input_word_ids, input_mask, input_type_ids),
    )
    outputs = model([input_word_ids, input_mask, input_type_ids])

I tested for both Tensorflow 2.1.0 and 2.2.0 on a cpu machine.

I wonder if this warning is related to the memory leak:

WARNING:tensorflow:5 out of the last 5 calls to <function recreate_function.<locals>.restored_function_body at 0x7f23a70d9680> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

Any idea what could be the problem?

Trying to Run Tokenizer

Hi see--

I am trying to add tokens by
tokenizer.add_tokens(add_tokens, offset=offset)

But I got error

TypeError: add_tokens() got an unexpected keyword argument 'offset'

Are you using anything different?

Regards,
Ankur

Unable to run tfhub sample

I'm trying to run the tfhub sample, got the following error
Model name 'tokenizer_tf2_qa' was not found in tokenizers model name list
(I did pip install transformers),
can you help?

Error while executing demo.py on cpu?

Hello,
I am getting following error while executing demo.py with custom document text(more text) on CPU. It works fine with GPU though.

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,512] = 512 is not in [0, 512)
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/tf_bert_for_natural_question_answering/StatefulPartitionedCall/bert/StatefulPartitionedCall/embeddings/position_embeddings/embedding_lookup}}]] [Op:__inference_restored_function_body_89164]

Thanks
Mahesh

Model

Is there a pretrained model available?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.