Comments (5)
Thanks for answering the issue @WadoodAbdul .
For now squad_multitask
is tied to SQuAD dataset, but it's possible to use your own QA dataset as @WadoodAbdul said
If you don't want to use that script then you can use a custom dataset as follows.
- Process your dataset according to the format for the model (described in readme). You can use the code from
prepare_data.py
script - Make sure the dataset returns
source_ids
,target_ids
andattention_mask
- Use your dataset here instead of loading the cached data.
Rest of the code can stay the same. Let me know if this helps or not.
from question_generation.
Hi patil-suraj, thank you for the nice repo. Is it possible for you to show us how to fine tune using custom dataset,
I've tried the changing link in squad_multitask.py approach but somehow it keeps failing and I had no luck with the direct loading from the nlp.load_dataset in prepare_data.py
The followings are my datasets:
- https://raw.githubusercontent.com/hariesramdhani/test_repo/main/dev-v1.1.json
- https://raw.githubusercontent.com/hariesramdhani/test_repo/main/train-v1.1.json
Thank you very much
from question_generation.
Yes, any data set can be used to train the models.
If you input the data in squad format, and change the directory of the train and valid split generator in squad_multitask.py, you'll be good to go.
from question_generation.
@WadoodAbdul have you tried it? Were you able to load the model correctly? If so can you give me the snippet please?
Thank you
from question_generation.
@hariesramdhani I'm not sure if you are still stuck, but you need to include data_files= '/path/to/file'
in the nlp.load_dataset() (source: https://huggingface.co/docs/datasets/v0.4.0/add_dataset.html)
Example: natural_question = nlp.load_dataset("natural_questions", data_files= '/content/drive/My Drive/natural_questions', split=nlp.Split.TRAIN)
from question_generation.
Related Issues (20)
- Can we generate questions based on the type of question?
- abbrevation in question generated are some times coming in small case such as it instead of IT/ai instead AI etc
- Retraining the model valhalla/t5-small-e2e-qg with questions only
- Model trained for e2e-qg does not generate any questions
- How to get the `char end` and `char start` in the generated question and answers from valhalla/t5-base-qg-hl? HOT 1
- error while using onnx runtime
- Requirements.txt needed HOT 1
- How would you fix these issues to get the project running HOT 1
- Fine tuning a T5 model with another language
- How can i get this to work with transformer 4.x HOT 1
- Loss Function multi-task
- AttributeError: module 'dill._dill' has no attribute 'PY3' HOT 5
- requirements.txt file would be very useful
- unexepect <pad> HOT 1
- ValueError: substring not found HOT 1
- How can i run this project with transformers 4?. HOT 1
- Which Bart model used in the pipeline
- Is it possible to add a parameter for number of question to be generated in the question generation model
- No such file or directory: '/root/.cache/huggingface/datasets/squad_multitask/highlight_qg_format/1.0.0/dataset_info.json' HOT 5
- Fine-tuning using GPU HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from question_generation.