lil-lab / nlvr Goto Github PK

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Home Page: http://lic.nlp.cornell.edu/nlvr/

Python 5.96% HTML 77.79% Jupyter Notebook 16.25%

computer-vision corpus machine-learning natural-language-processing

nlvr's People

Contributors

Stargazers

Watchers

Forkers

tonydeep shenjiawei19 jdc08161063 udinaveh dzheng256 codeaudit debajyotidatta hbcbh1999 baihuajun24 anukat2015 wz337 shubhampachori12110095 allenai nguyenducnhaty esmaeilinia seafire1991 aartika mcemilg b628alloon hyzcn yhide bouri s2t2 cfsego kig1929 ffzhang1231 yangyl18 rishika-rathore rshnk73 rositsakirova boshu-shao yichunli95 arjunakula rmccmc strategist922 tomlatham123 g25h1 corlder bounabyazid yuchen-xu sherifgabr xwl-xwl niallmcguire dingybell-jnr dapeng2018 afessha server250 clb17151 anjana-mu abinmon steg3009 hfaghihi15 borororo yqy2001 jakartaresearch james-tierney fencer1223 antrea155 nadhaniel

nlvr's Issues

Typos in the dataset

I found that there are many typos in the dataset (e.g., ciircles, yelllow, yelloe, ...). I can send out a small PR that fixes some of the sentences with typos. But I was wondering whether you just want to keep the dataset the same as it is now (i.g., just leave the typos as they are) ?

Thanks for creating this cool dataset.

NLVR dataset uploaded & visualized at tagtog

Thank you so much for creating this great dataset.

I have uploaded the NLVR dataset to tagtog for easier visualization and exploration of the data.

Here the project's link with its guidelines/README: https://www.tagtog.net/NLVR/NLVR/-settings#tab-guidelines

Here for instance a sample: https://www.tagtog.net/NLVR/NLVR/pool%2Ftrain/aWRfhf_ACQLgY5U9nULhEdhX8938-998_1.md?p=0&i=3

It looks like this:

Do you have some thoughts? Feedback? It would be interesting to entirely explore the NLVR2 dataset too.

The images have 4 channels

The given images have 4 colour channels and the last channel looks like this,

Is there any particular reason for this?

Label distributions

I am looking at the distributions of labels for sentences across structured representations, and wanted to check if my observations are correct. I grouped structured representations by gathering those in examples whose identifiers have the same "m" values, where each identifier is of the form "m-n".

I imagined I would find each sentence occurring with four different structured representations, two in which it is true and two in which it is false. However, I see that

Only 3358 out of 3696 sentences in train, dev and test occur with 4 different structured representations, and others have less than four structures.
Only 2176 out of the 3696 sentences have equal numbers of true and false labels.
465 out of the 3696 sentences have all true or all false labels.

I can see how (1) may have happened during post-processing to remove examples with low agreement, but I am not sure about (2) and (3). Please let me know if this is expected.

Thanks!

question about Implementation details of Image Features+RNN

In image features + RNN method, you use color, shape and size etc. to construct a set of feature for every object index. Let's take color and shape into account for example. Given a image with two objects:
Object 1 is represented as： color[0,0,1] shape [1,0,0]
Object 2 is represented as : color [0,1,0] shape[0,1,0]

You said you use the concatenation of the one-hot image features to compute the image embeddings
with two layers of size 32.

Does it mean that you first concatenate the features of a Object, namely Object 1 ->[0,0,1,1,0,0] and Object 2 ->[0,1,0,0,1,0]. And then they will be put into Embedding Layer1 to produce vector e1 and e2.
e1 and e2 will be concatenated again and then put into Embedding Layer2 to produce final image Embedding?

Or it means concatenates all features of all Objects,namely image->[0,0,1,1,0,0,0,1,0,0,1,0] and then it will be embedded by two Embedding Layers?

Maybe both of my comprehensions are wrong.

Thanks you for this cool corpus.

Broken json files

When I try to parse any of json files in python or julia, I got the following error:

>>> import json
>>> with open('train.json') as data_file:
...     data= json.load(data_file)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "erenay/anaconda/lib/python2.7/json/__init__.py", line 291, in load
    **kw)
  File "erenay/anaconda/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "erenay/anaconda/lib/python2.7/json/decoder.py", line 367, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 991 column 1 (char 700 - 901156)

License?

Would you be able to license the repository data and code? For the data, I might suggest the possible combination of ODbL (https://opendatacommons.org/licenses/odbl/1-0/; for the database) and CC0-1.0 (https://creativecommons.org/publicdomain/zero/1.0/; for the individual images).

File missing : image Split folder when using CLIP4CIR

CLIP4CIR is trained on the cirr dataset which is now nlvr2 and it requires something called image_splits. Could you please clarify how it works ?

https://github.com/ABaldrati/CLIP4Cir#data-preparation

Vocabulary words

This isn't necessarily an issue, but rather an observation. It seems there are quite a few misspelt words in the example sentences.

Here is the output of a script which takes all examples, extracts their sentences, splits on spaces, lowercases all words, and uniqfies the words to form a simple vocabulary:

['1', 'more', '', 'isa', 'yelow', '.', 'has', 'numbers', 'block', 'having', 'have', 'no', 'base', 'leats', 'item,', 'back', 'other.', 'all', 'triangle,', 'odd', 'circles', 'circle', 'od', 'but', 'two', 'adge', 'black.', 'another', 'at', 'every', 'nealy', 'and', 'bases', 'touching', 'underneath', 't', 'sthe', 'box', 'same', 'one.', 'count', 'one', 'other', 'among', 'directly', 'objects', 'color.', 'do', 'wirh', 'bow', 'to', 'only.', '3', 'left', 'third', 'middle', 'than', 'that', 'less', 'closely', 'line', 'items,', 'shape.', 'out', 'boxes.', 'line.', 'bellow', 'ones.', 'triangle.', 'wwith', 'item', 'first', 'lot', 'nearly', 'ablue', 'triangle', 'coloured', 'not', 'both', 'box,', 'lease', 'box.', 'square,', 's', 'set', 'different.', 'over', 'colors.', 'alternately', 'them', 'just', 'in', 'between', 'smaller', '2', 'yelloe', 'objects,', 'squere', 'which', 'small', 'bottom-right', 'blue.', 'where', 'most', 'traingles.', 'ble', 'either', 'blicks', 'near', 'each', 'side', 'grey', ',', 'traingle', 'block.', 'bottom', 'beneath', '5', 'an', 'touhing', 'blue,', 'items.', 'bottom.', 'containing', 'positioned.', 'tocuhing', 'they', 'corner', 'height', 'base.', 'lest', 'square', 'three.', 'blocks,', 'consecutive', 'are', 'one,', 'total', 'yellow', 'different', 'towers', 'top.', 'item.', 'yelllow', 'block/', 'none', 'size.', 'bule', 'shapes', 'objects.', 'triangles.', 'only', 'number', 'some', 'size', 'least', 'ans', 'their', 'colour', 'objetcs', 'yellow.', 'ciircles', 'black,', 'any', 'second', 'corners.', 'middle.', 'contain', 'six', 'wih', 'medium', 'including', 'black', 'color', 'yellow,', 'circles.', 'attach', 'under', 'shape', 'all.', 'wth', 'height.', 'blacks', 'attached.', 'blocks', 'right', 'atleast', 'four', 'exacty', 'tow', 'theer', 'block,', 'each.', 'ones', 'bloxk', 'ia', 'bases.', 'big', 'exactly', 'items', 'then,idle', 'blccks', 'roof', 'squares.', 'eactly', 'this', 'attached', 'wall.', 'that.', 'colors', 'exacrly', 'is', 'corner.', 'ha', 'triangles', 'below', 'bo', 'opis', '6', 'edge', 'squares', 'thee', 'or', 'with', 'a', 'there', 'colored', 'exacts', 'towers.', 'boxes', 'hte', 'lower', 'positions', 'made', 'three', 'kinds', 'egde', 'top', 'almost', 'it.', 'four.', 'it', 'single', 'type', 'most.', 'cirlce', 'i', 'abox.', 'blocks.', 'same.', 'square.', 'blue', 'after', 'colours', 'bkack', 'together', 'colour.', 'ad', 'its', 'even', 'close', 'tleast', '4', 'the', 'five', 'seven', 'trianlge', 'circle,', 'position.', 'tower.', 'as', 'above', 'stacked', 'without', 'al', 'many', 'circ;e', 'tower', 'blocks..', 'side.', 'from', 'multiple', 'object', 'level.', 'stack', 'rectangle', 'b;ue', 'of', 'tower,', 'being', 'object.', 'circle.', 'on', 'sqaures', 'contains', 'wall', 'll']

I was just wondering whether there's any agreed upon convention for preprocessing the text?

lil-lab / nlvr Goto Github PK

nlvr's People

Contributors

Stargazers

Watchers

Forkers

nlvr's Issues

Typos in the dataset

NLVR dataset uploaded & visualized at tagtog

The images have 4 channels

Label distributions

question about Implementation details of Image Features+RNN

Broken json files

License?

File missing : image Split folder when using CLIP4CIR

Vocabulary words

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent