Code Monkey home page Code Monkey logo

udt-qa's Issues

Questions Regarding UDT-QA Knowledge Base

Hello Kaixin!
I have a question on the dataset you have provided, the KB dataset of UDT-QA(https://huggingface.co/kaixinm/UDT-QA/blob/main/knowledge_sources/WD_graphs_V.zip)

  1. One of the files included in the zip file, grouped_WD_graphs.jsonl. there are some relations of which the length is four.
    Following is the example.
{
   "triples":[
      [
         "The Simpsons",
         "award received",
         "Primetime Emmy Award for Outstanding Animated Program"
      ],
      // four elements from below
      [
         "The Simpsons",
         "Primetime Emmy Award for Outstanding Animated Program",
         "winner",
         "Marc Wilmore"
      ],
      [
         "The Simpsons",
         "Primetime Emmy Award for Outstanding Animated Program",
         "winner",
         "Ken Keeler"
      ],
   ]
}

In this case, what does each element of four-length relations mean? The 3rd and the 4th one would be relation and target respectively, but I cannot tell the first two.

  1. How did you select the knowledge base in WikiData? In the original source, it seems that there are really many knowledge sources (https://www.wikidata.org/wiki/Wikidata:Database_download). I found that the size of the entire wikidata JSON file is 120GB. So, I thought you extracted from the original Wikidata, not using the entire source. How did you filter the data? Can you tell me what part of Wikidata you used?

what is the ott_expanded_train_iter2 dataset in COS training script

I don't find any dataset named ott_expanded_train_iter2 anywhere, is it some refined dataset?

python -m torch.distributed.launch --nproc_per_node=16 python train_dense_encoder_COS.py train=biencoder_nq train_datasets=[hotpot_train_single,hotpot_train_expanded,hotpot_question_link_train,hotpot_link_train,hotpot_train_rerank,ott_single_train,ott_expanded_train_iter2,ott_link_train,ott_rerank_train,NQ_train,NQ_train_table,NQ_train_rerank,NQ_train_table_rerank] dev_datasets=[hotpot_dev_single,hotpot_dev_expanded,hotpot_question_link_dev,hotpot_link_dev,hotpot_dev_rerank] val_av_rank_start_epoch=35 output_dir=/output/dir train.batch_size=12 global_loss_buf_sz=1184000 train.val_av_rank_max_qs=20000 train.warmup_steps=8000 encoder.pretrained_file=/path/to/cos_pretrained_4_experts.ckpt checkpoint_file_name=dpr_biencoder encoder.use_moe=True encoder.num_expert=6 encoder.moe_type=mod2:attn train.use_layer_lr=False train.learning_rate=2e-5 encoder.encoder_model_type=hf_cos

Access Issue with Data Download from Storage Account

I encountered an issue while attempting to download the data using the provided download_data.py script. It appears that there is an error associated with accessing the following URL: https://msrdeeplearning.blob.core.windows.net/.

The error message displayed is as follows:

PublicAccessNotPermitted
Public access is not permitted on this storage account. RequestId:e068a9cd-801e-0012-508f-d1d7ed000000 Time:2023-08-18T04:53:00.0657459Z

If this situation is intentional, I would appreciate your guidance on when I might be able to gain access to the data.

Thank you for your assistance.

Unable to Download Required Resources for Reader Evaluation

Hi there!

First of all, I would like to express my appreciation for open-sourcing your work.

I'm reaching out because I encountered an issue while attempting to perform the reader evaluation as described in the project's ReadMe. Unfortunately, I've been unable to download the necessary resources needed for this evaluation. It appears that public access to these resources is currently restricted.

Command

python download_data.py --resource cos.model.reader.ott --output_dir downloaded

Error stack

Requested resource from %s https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/models/ott_fie_checkpoint_best.pt
Download root_dir %s downloaded
File to be downloaded as %s /home1/deokhk_1/research/UDT-QA/downloaded/downloads/cos/model/reader/ott/ott_fie_checkpoint_best.pt
Traceback (most recent call last):
  File "download_data.py", line 337, in <module>
    main()
  File "download_data.py", line 330, in main
    download(args.resource, args.output_dir)
  File "download_data.py", line 305, in download
    local_file = download_resource(
  File "download_data.py", line 283, in download_resource
    wget.download(link, out=local_file)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/site-packages/wget.py", line 526, in download
    (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/home1/deokhk_1/anaconda3/envs/COS/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 409: Public access is not permitted on this storage account.

I kindly request your assistance in making the required resources accessible to the public.

Thank you!

Question about dataset

For testing End-to-end performance on HotpotQA, I downloaded reader model from
"https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/models/hotpot_reader_checkpoint_best.pt" and reader data from
"https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/results/HotpotQA/hotpot_dev_reader_2hops.json".

I can get the same reults as in Table A3, in paper. But I have a question in reader data.
What corpus did you use for evaluating HotpotQA? To be specific, what is the source of retrieved document mentioned as value of key "pos titles" and "title" in reader data? I tried to find it but I can find only details about corpus of pretraining.

Plus, I wonder the date of dump of "https://msrdeeplearning.blob.core.windows.net/udq-qa/COS/data/HotpotQA/hotpot_corpus.jsonl" file.

Missing Files for Running COS Inference on OTT-QA

Hello!

First, I would like to express my gratitude for your work.
I have a question regarding a problem I'm facing while trying to run COS inference on OTT-QA.

It seems that the files to be specified in the following arguments do not exist on hugging face. Can you help me with this?

  • encoded_ctx_files=[/path/to/ott_table_original*]
  • ctx_datatsets=[/path/to/ott_table_chunks_original.json,/path/to/ott_wiki_passages.json,[/path/to/table_chunks_to_passages*]]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.