Code Monkey home page Code Monkey logo

ilm's Issues

How to Run Juypter Notebook?

It continually says I am missing certain things. I am new, and would really like to try it. Could I please have some guidance?

Early stopping

Where did you implement the early stopping based on PPL on the validation set in the training script? Thanks

The issue during training.

I have set up the dataset according to your instructions. However, when I tried to train using your code, I encountered the following error. I have spent some time debugging it without a solution. Can you please advise on how to resolve this?

image

Infill a sentence as a continuation of a conditioning token?

I'd like to be able to do a version of sentence infilling that allows for conditioning the generation on a leading token—i.e., semantically prompting the generated infill sentence. In your estimation, would it be a big job to enable this kind of generation? I'm thinking of the way that initial words like "however", "therefore", "further", and so on can have a strong semantic effect on the kind of sentence infill generated.

Infill for fine tuned model

I have a fine-tuned GPT-2 model, trained on a specific text domain, on english language. The model input has been tokenized with SentencePiece. How to adapt that model to ILM if possible?

Thank you.

How to get top k spans for a mask

Hi,

Thanks for releasing this! I've only just started to play with it but managed to get the example from the Jupyter Notebook working without any problems.

My questions:

  1. Is it possible to generate a phrase of a variable number of tokens (e.g. <= 2) in a mask?
    E.g. I like to _ on Tuesdays.
    sing, eat cake, eat bread, dance, ...
    I tried using the <|startofinfill|> and <|endofinfill|> tokens but the output didn't make sense:
I am looking forward to<|startofinfill|><|endofinfill|> in this year summer camps. 
--------------------------------------------------------------------------------
I am looking forward to We have my entire gym membership. My mom's husband has taken my sister and I to their house. I can not wait to go. in this year summer camps.
  1. Can we also get a top k list of candidates and probabilities for each mask?

Thanks!

How to use custom tokenizer?

I have a custom tokenizer that's just a BertTokenizer with a custom vocab, which was used when pretraining my GPT-2 model. I'm trying to specify it to the train_ilm.py script, but I hit a NotImplemented error that I'm not sure how to solve. Any thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.