Comments (5)
It takes 5 days to complete 1 epoch on Wikipedia with 8 V100 GPUs. I believe the controllability of the models can be further increased by (1) increase the latent dimension, and (2) Training longer.
from optimus.
It takes 5 days to complete 1 epoch on Wikipedia with 8 V100 GPUs. I believe the controllability of the models can be further increased by (1) increase the latent dimension, and (2) Training longer.
Thanks for replying. How many epochs do you pre-train Optimus for? Specifically, what's the batch size and the number of pre-training steps?
from optimus.
For the results reported in the paper, I used the pre-trained model with 1 epoch and latent size 32.
Here is one example for the pre-training script:
https://github.com/ChunyuanLI/Optimus/blob/master/code/scripts/scripts_philly/train_vae_wikipedia_distributed.yaml
The batch size is 16 * 8 = 128 sentences. There are nearly 2M sentences in Wikipedia.
from optimus.
Thanks for your reply : )
from optimus.
#8 a similar issue
from optimus.
Related Issues (20)
- Question: why this choice of BERT and GPT2? HOT 2
- DailyDialogue dataset HOT 4
- demo website
- how about using gpt2 as encoder and decoder? HOT 1
- Demo webset is dead HOT 2
- One question about the decoder of vae
- Missing requirements file HOT 1
- About Pre-training on the Wikipedia dataset HOT 2
- GPT2ForLatentConnector HOT 2
- Hyper-parameters to reproduce language modelling results
- Chinese Pretrained Model
- Format of input files split by NLTK used as input for preprocessing: "wikipedia.segmented.nltk.split.seq64.0.json"
- issue about reproducing results on SNLI dataset HOT 1
- Question about mutual information
- How to run the Label-Conditional Text Generation experiment on YELP dataset
- Seems like checkpoints for {beta=0, beta=0.5} latent size=32 are the same checkpoints
- How about the reconstruction BLEU of AE and VAE?
- Dataset access denied HOT 1
- Pre-trained model download is not available. HOT 4
- The loss_encoder and loss_lsc in cara.py cancel each other
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optimus.