mayer123 / udt-qa Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "Open Domain Question Answering with A Unified Knowledge Interface" (ACL 2022)
License: GNU General Public License v3.0
Code for the paper "Open Domain Question Answering with A Unified Knowledge Interface" (ACL 2022)
License: GNU General Public License v3.0
Hi there!
First of all, I would like to express my appreciation for open-sourcing your work.
I'm reaching out because I encountered an issue while attempting to perform the reader evaluation as described in the project's ReadMe. Unfortunately, I've been unable to download the necessary resources needed for this evaluation. It appears that public access to these resources is currently restricted.
Command
python download_data.py --resource cos.model.reader.ott --output_dir downloaded
Error stack
Requested resource from %s https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/models/ott_fie_checkpoint_best.pt
Download root_dir %s downloaded
File to be downloaded as %s /home1/deokhk_1/research/UDT-QA/downloaded/downloads/cos/model/reader/ott/ott_fie_checkpoint_best.pt
Traceback (most recent call last):
File "download_data.py", line 337, in <module>
main()
File "download_data.py", line 330, in main
download(args.resource, args.output_dir)
File "download_data.py", line 305, in download
local_file = download_resource(
File "download_data.py", line 283, in download_resource
wget.download(link, out=local_file)
File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/site-packages/wget.py", line 526, in download
(tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 409: Public access is not permitted on this storage account.
I kindly request your assistance in making the required resources accessible to the public.
Thank you!
I encountered an issue while attempting to download the data using the provided download_data.py script. It appears that there is an error associated with accessing the following URL: https://msrdeeplearning.blob.core.windows.net/.
The error message displayed is as follows:
PublicAccessNotPermitted
Public access is not permitted on this storage account. RequestId:e068a9cd-801e-0012-508f-d1d7ed000000 Time:2023-08-18T04:53:00.0657459Z
If this situation is intentional, I would appreciate your guidance on when I might be able to gain access to the data.
Thank you for your assistance.
I don't find any dataset named ott_expanded_train_iter2 anywhere, is it some refined dataset?
python -m torch.distributed.launch --nproc_per_node=16 python train_dense_encoder_COS.py train=biencoder_nq train_datasets=[hotpot_train_single,hotpot_train_expanded,hotpot_question_link_train,hotpot_link_train,hotpot_train_rerank,ott_single_train,ott_expanded_train_iter2,ott_link_train,ott_rerank_train,NQ_train,NQ_train_table,NQ_train_rerank,NQ_train_table_rerank] dev_datasets=[hotpot_dev_single,hotpot_dev_expanded,hotpot_question_link_dev,hotpot_link_dev,hotpot_dev_rerank] val_av_rank_start_epoch=35 output_dir=/output/dir train.batch_size=12 global_loss_buf_sz=1184000 train.val_av_rank_max_qs=20000 train.warmup_steps=8000 encoder.pretrained_file=/path/to/cos_pretrained_4_experts.ckpt checkpoint_file_name=dpr_biencoder encoder.use_moe=True encoder.num_expert=6 encoder.moe_type=mod2:attn train.use_layer_lr=False train.learning_rate=2e-5 encoder.encoder_model_type=hf_cos
Hello Kaixin!
I have a question on the dataset you have provided, the KB dataset of UDT-QA(https://huggingface.co/kaixinm/UDT-QA/blob/main/knowledge_sources/WD_graphs_V.zip)
grouped_WD_graphs.jsonl
. there are some relations of which the length is four.{
"triples":[
[
"The Simpsons",
"award received",
"Primetime Emmy Award for Outstanding Animated Program"
],
// four elements from below
[
"The Simpsons",
"Primetime Emmy Award for Outstanding Animated Program",
"winner",
"Marc Wilmore"
],
[
"The Simpsons",
"Primetime Emmy Award for Outstanding Animated Program",
"winner",
"Ken Keeler"
],
]
}
In this case, what does each element of four-length relations mean? The 3rd and the 4th one would be relation and target respectively, but I cannot tell the first two.
For testing End-to-end performance on HotpotQA, I downloaded reader model from
"https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/models/hotpot_reader_checkpoint_best.pt" and reader data from
"https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/results/HotpotQA/hotpot_dev_reader_2hops.json".
I can get the same reults as in Table A3, in paper. But I have a question in reader data.
What corpus did you use for evaluating HotpotQA? To be specific, what is the source of retrieved document mentioned as value of key "pos titles" and "title" in reader data? I tried to find it but I can find only details about corpus of pretraining.
Plus, I wonder the date of dump of "https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/data/HotpotQA/hotpot_corpus.jsonl" file.
Hello!
First, I would like to express my gratitude for your work.
I have a question regarding a problem I'm facing while trying to run COS inference on OTT-QA.
It seems that the files to be specified in the following arguments do not exist on hugging face. Can you help me with this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.