mayer123 / udt-qa Goto Github PK

Code for the paper "Open Domain Question Answering with A Unified Knowledge Interface" (ACL 2022)

License: GNU General Public License v3.0

Python 99.06% Shell 0.44% Perl 0.50%

udt-qa's Issues

Questions Regarding UDT-QA Knowledge Base

Hello Kaixin!
I have a question on the dataset you have provided, the KB dataset of UDT-QA(https://huggingface.co/kaixinm/UDT-QA/blob/main/knowledge_sources/WD_graphs_V.zip)

One of the files included in the zip file, grouped_WD_graphs.jsonl. there are some relations of which the length is four.
Following is the example.

{
   "triples":[
      [
         "The Simpsons",
         "award received",
         "Primetime Emmy Award for Outstanding Animated Program"
      ],
      // four elements from below
      [
         "The Simpsons",
         "Primetime Emmy Award for Outstanding Animated Program",
         "winner",
         "Marc Wilmore"
      ],
      [
         "The Simpsons",
         "Primetime Emmy Award for Outstanding Animated Program",
         "winner",
         "Ken Keeler"
      ],
   ]
}

In this case, what does each element of four-length relations mean? The 3rd and the 4th one would be relation and target respectively, but I cannot tell the first two.

How did you select the knowledge base in WikiData? In the original source, it seems that there are really many knowledge sources (https://www.wikidata.org/wiki/Wikidata:Database_download). I found that the size of the entire wikidata JSON file is 120GB. So, I thought you extracted from the original Wikidata, not using the entire source. How did you filter the data? Can you tell me what part of Wikidata you used?

migrat UDT-QA data and models to HF

could you migrat UDT-QA data and models to HF except Knowledge Source data, include: trained verbalizer model，Retriever data and models，thank you very much!

@Mayer123

what is the ott_expanded_train_iter2 dataset in COS training script

I don't find any dataset named ott_expanded_train_iter2 anywhere, is it some refined dataset?

python -m torch.distributed.launch --nproc_per_node=16 python train_dense_encoder_COS.py train=biencoder_nq train_datasets=[hotpot_train_single,hotpot_train_expanded,hotpot_question_link_train,hotpot_link_train,hotpot_train_rerank,ott_single_train,ott_expanded_train_iter2,ott_link_train,ott_rerank_train,NQ_train,NQ_train_table,NQ_train_rerank,NQ_train_table_rerank] dev_datasets=[hotpot_dev_single,hotpot_dev_expanded,hotpot_question_link_dev,hotpot_link_dev,hotpot_dev_rerank] val_av_rank_start_epoch=35 output_dir=/output/dir train.batch_size=12 global_loss_buf_sz=1184000 train.val_av_rank_max_qs=20000 train.warmup_steps=8000 encoder.pretrained_file=/path/to/cos_pretrained_4_experts.ckpt checkpoint_file_name=dpr_biencoder encoder.use_moe=True encoder.num_expert=6 encoder.moe_type=mod2:attn train.use_layer_lr=False train.learning_rate=2e-5 encoder.encoder_model_type=hf_cos

Access Issue with Data Download from Storage Account

I encountered an issue while attempting to download the data using the provided download_data.py script. It appears that there is an error associated with accessing the following URL: https://msrdeeplearning.blob.core.windows.net/.

The error message displayed is as follows:

PublicAccessNotPermitted
Public access is not permitted on this storage account. RequestId:e068a9cd-801e-0012-508f-d1d7ed000000 Time:2023-08-18T04:53:00.0657459Z

If this situation is intentional, I would appreciate your guidance on when I might be able to gain access to the data.

Thank you for your assistance.

Unable to Download Required Resources for Reader Evaluation

Hi there!

First of all, I would like to express my appreciation for open-sourcing your work.

I'm reaching out because I encountered an issue while attempting to perform the reader evaluation as described in the project's ReadMe. Unfortunately, I've been unable to download the necessary resources needed for this evaluation. It appears that public access to these resources is currently restricted.

Command

python download_data.py --resource cos.model.reader.ott --output_dir downloaded

Error stack

Requested resource from %s https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/models/ott_fie_checkpoint_best.pt
Download root_dir %s downloaded
File to be downloaded as %s /home1/deokhk_1/research/UDT-QA/downloaded/downloads/cos/model/reader/ott/ott_fie_checkpoint_best.pt
Traceback (most recent call last):
  File "download_data.py", line 337, in <module>
    main()
  File "download_data.py", line 330, in main
    download(args.resource, args.output_dir)
  File "download_data.py", line 305, in download
    local_file = download_resource(
  File "download_data.py", line 283, in download_resource
    wget.download(link, out=local_file)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/site-packages/wget.py", line 526, in download
    (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 409: Public access is not permitted on this storage account.

I kindly request your assistance in making the required resources accessible to the public.

Thank you!

Question about dataset

For testing End-to-end performance on HotpotQA, I downloaded reader model from
"https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/models/hotpot_reader_checkpoint_best.pt" and reader data from
"https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/results/HotpotQA/hotpot_dev_reader_2hops.json".

I can get the same reults as in Table A3, in paper. But I have a question in reader data.
What corpus did you use for evaluating HotpotQA? To be specific, what is the source of retrieved document mentioned as value of key "pos titles" and "title" in reader data? I tried to find it but I can find only details about corpus of pretraining.

Plus, I wonder the date of dump of "https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/data/HotpotQA/hotpot_corpus.jsonl" file.

Missing Files for Running COS Inference on OTT-QA

Hello!

First, I would like to express my gratitude for your work.
I have a question regarding a problem I'm facing while trying to run COS inference on OTT-QA.

It seems that the files to be specified in the following arguments do not exist on hugging face. Can you help me with this?

encoded_ctx_files=[/path/to/ott_table_original*]
ctx_datatsets=[/path/to/ott_table_chunks_original.json,/path/to/ott_wiki_passages.json,[/path/to/table_chunks_to_passages*]]

Public access is not permitted on this storage account..

wget https://msrdeeplearning.blob.core.windows.net/udq-qa/data/tables/all_raw_table_chunks_for_index.json

mayer123 / udt-qa Goto Github PK

udt-qa's Issues

Questions Regarding UDT-QA Knowledge Base

migrat UDT-QA data and models to HF

what is the ott_expanded_train_iter2 dataset in COS training script

Access Issue with Data Download from Storage Account

Unable to Download Required Resources for Reader Evaluation

Question about dataset

Missing Files for Running COS Inference on OTT-QA

Public access is not permitted on this storage account..

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent