Comments (8)
@Santosh-Gupta & @JayYip thanks so much guys.
from docproduct.
What GPU you use to trained your model? Is 8 GB VRAM not enough? Does the OOM error comes from Loading BioBERT model or from your architecture?
@JayYip or @ash3n should be able to answer that one.
Did you use num_epochs 1 in training "sampleData.csv" and got good result? If not, what are good parameters I need to use?
@JayYip or @ash3n but I believe we ran very few epochs. In the single digits. Possibly 1
I notice, you used "Float16EmbeddingsExpanded.pkl" in DocProductPresentation notebook but not in the training our own QA. Then, what is the importance of this file?
We could not load the original embeddings into google colab without it crashing.
Does the answer is auto-generated or just retrieved from "sampleData.csv"? If this is a retrieval, the model must look from a some kind of database or a pool of QA, where is this?
We have two notebooks, one uses GPT2 to generate, the other is simple retrieval.
the model must look from a some kind of database or a pool of QA, where is this?
The "Float16EmbeddingsExpanded.pkl"
Also, I couldn't find some of the result answers from the "sampleData.csv", where do these answers come from?
Try this one instead
https://github.com/Santosh-Gupta/datasets
To see if your text data is trainable, What you can do is train on your data with just the FFNN. Which means encode your texts with Bert (2nd to last layer average of all the context vectors), and have seperate FFNN layers for each of the question and answer embedding. It won't be as good as training the bert weights, but it's much faster and should be able to give you decent results. If these results don't make sense this way, something may be off with your data.
If you are using scientific/medical texts, you will want to use scibert or biobert, and then use Bert-as-a-service to batch encode your texts. But if not, I would recommend using tensorflow hub or pytorch hub to mass encode your texts. I especially recommend pytorch's roBerta weights.
from docproduct.
Sorry for the late reply.
What GPU you use to trained your model? Is 8 GB VRAM not enough? Does the OOM error comes from Loading BioBERT model or from your architecture?
I used Titan Xp but I think there's something wrong with it since it still raised OOM even the batch size was set to 1.
Did you use num_epochs 1 in training "sampleData.csv" and got good result? If not, what are good parameters I need to use?
We trained for a couple epochs. You can try something between 5-10.
from docproduct.
@JayYip @ash3n now that the tf 2.0 hackathon is over, maybe we should switch to the pytorch huggingface BERT, which is somehow very lightweight. It runs on the colab Tesla k80 GPU no problem. It is widely used, and continuously updated.
from docproduct.
It runs on the colab Tesla k80 GPU no problem.
TensorFlow and Pytorch are not that different in terms of GPU memory. K80 should be fine training 12-layers transformer.
It is widely used, and continuously updated
I agree with this point but that will take some work. We need to change the input pipeline from tf.data
to torch.utils.data
and the model from Keras to pytorch. It'll take a couple days and I'm not sure whether I have time to do it.
from docproduct.
It runs on the colab Tesla k80 GPU no problem.
TensorFlow and Pytorch are not that different in terms of GPU memory. K80 should be fine training 12-layers transformer.
It is widely used, and continuously updated
I agree with this point but that will take some work. We need to change the input pipeline from
tf.data
totorch.utils.data
and the model from Keras to pytorch. It'll take a couple days and I'm not sure whether I have time to do it.
True. Maybe for another project. I am actually doing archive manatee which uses the exact same architecture: two tower bert.
from docproduct.
@ronykalfarisi @JayYip @JayYip Hey , can you help me out in running the model locally on my machine ?
from docproduct.
@abhijeet201998 The code is tested on Linux machine with Titan Xp GPU. Not 100% sure whether it will work on Windows and MacOS.
from docproduct.
Related Issues (20)
- Data found under https://github.com/Santosh-Gupta/datasets is not 700,000 question/answer pairs HOT 13
- What were the 'loss' and 'qa_pair_batch_accuracy' you got when you trained on your entire dataset?
- File damage HOT 1
- Details regarding your Training on different Models.
- Replacing the pubmed v1.0 with v1.1 HOT 1
- possible to upload dataset on Kaggle? HOT 1
- pandas can't read_parquet Embedded File:- BioBertFolder/bertffn_crossentropy.zip HOT 3
- System requirements for pandas.load_parquet HOT 2
- Can you publish you tensor-board training trajectories? HOT 2
- Translation of BioBERT Tensorflow Weights HOT 1
- ImportError: cannot import name 'export_saved_model' HOT 3
- Continual training HOT 4
- Google Colab not running HOT 2
- ImportError: cannot import name 'dense_features' HOT 6
- Training Data HOT 1
- import tensorflow as tf error HOT 1
- "TF record not found"
- Train your own medical Q&A retrieval model
- ModuleNotFoundError: No module named 'docproduct', even after performing `!pip install docproduct` in official colab notebook. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docproduct.