cannylab / vdtk Goto Github PK
View Code? Open in Web Editor NEWVisual Description Dataset Analysis Toolkit
License: MIT License
Visual Description Dataset Analysis Toolkit
License: MIT License
Hi David!
Thanks for the wonderful library for VL evaluation!
Here is a potential improvement I want to discuss with you. It is about expanding the format of media inputs.
Presently, in clip_recall.py
the images are loaded from local path. However, there are use cases where images are not stored in single files but in a huge tar package.
I think it could be better to support binary inputs. If so, there are multiple design choices here. e.g., store binary string in the “media_path”
, or add an additional field called “media_binary”
, etc. I choose the later so far xk-huang@afbd47a
I would certainly like to hear your ideas about it.
Besides, I cannot pull all the files stored with git-lfs due to the insufficient quota in the repo. Thus, the tests cannot be thoroughly run.
Thanks in advance!
I found VDTK quite usefull as implementation all semantic metrics at the dataset level, and used it to compare my dataset for Remote Sensing with previous, as RSICD and RSITMD. However there are a few robustess issues in VDTK. Often a simple test would allow to continue computing with partial results. I have only attempted correction for some of these. Did somebody also encounter issues and modified this code ?
$uname -a
Linux minds01.irtse-pf.ext 5.11.0-41-generic #45~20.04.1-Ubuntu SMP Wed Nov 10 10:20:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# conda install follows : vdtk_min.yaml
name: geo2
channels:
- conda-forge
- defaults
dependencies:
- pip=21.2.4
- pip:
- flake8-black>=0.3.6
- tensorflow==2.12.0
- sentence-transformers==2.2.2
- bert-score==0.3.13
- spacy==3.5.2
- vdtk==0.3.0
- rich==13.3.4
- python-levenshtein
I coded a simple converter to COCO format from the format used in RSICD , RSITMD and my own set.
These datasets are attached in zip, directly usable with vdtk.
ngram_stats fails on RSITMD in env geo2: _ ngram_stats() got an unexpected keyword argument 'reference_key'_ click/core.py line 760, in invoke return __callback(*args, **kwargs)
coreset fails all sets: division by zero_ core_set.py line 114, in coreset
table.add_row(f"{f:.2f}", str(s), f"{s * 100 /len(test_data):.2f}%")
sample: on geotruth (OK RSICD, RSITMD) max() arg is an empty sequence geotruth: qualitative_sample.py line 86, in qualitative_sample
best_bleu_mean_caption = max(mabs.items(), key=lambda x: x[1])`
clip_recall on RSICD: No such file or directory: 'airport_1.jpg' in clip_recall.py line 36, in _get_feature Image.open(media_path)
. Also on RSITMD for 'baseballfield_452.tif', geotruth for 'tile_535000-6245200.tif'
clip_recall on RSICD : zero-size array to reduction operation maximum which has no identity inside clip_recall.py line 163, np.amax(i)
raised by numpy.
Content Recall : all nan . Why ?
Content Recall
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Dataset ┃ Noun Recall ┃ Verb Recall ┃ Noun Recall (Fuzzy) ┃ Verb Recall (Fuzzy) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ dataset_rsicd_v2_vdtk.json │ nan ± nan │ nan ± nan │ nan ± nan │ nan ± nan │
└────────────────────────────┴─────────────┴─────────────┴─────────────────────┴─────────────────────┘
Thanks for this useful toolkit. I noticed a reproductible issue with concept-leave-one-out, killed after 6 minutes (attempted 3 times). The first, I was unsure if the server was too loaded and/or dataset a bit large. I retried this command with no load at all.
It appears concept-leave-one-out does not work properly on 2 datasets, on which other commands worked, and there is no message or easy check.
I installed the package in a conda env, Python 3.9.13 on a 20.04.1-Ubuntu SMP 64 cores.
The 'pip install vdtk' didnot trigger dependencies. So I manually installed those per pyproject.toml:
PRESENT in conda list: "pandas >= 1.5.1", "pytest", "nltk >= 3.6.5", "numpy >= 1.21.4", "matplotlib >= 3.5.0", "fuzzywuzzy >= 0.18.0", "fuzzysearch >= 0.7.3", "jdk4py >= 17.0.3.0", "rich >= 10.14.0", "mpire >= 2.3.1", "click >= 8.0.3", "embeddings >= 0.0.8", "mauve-text >= 0.3.0", "regex >= 2022.10.31", "rouge-score >= 0.1.2", "tqdm >= 4.62.3",
in pip list: "POT >= 0.8", "ftfy >= 6.1.1", "sentence-transformers >= 2.1.0", "bert-score >= 0.3.12", "spacy >= 3.2.0",
python3 -m spacy download en
pip install levenshtein # remove _Warning: Using slow pure-python SequenceMatcher_ in fuzzywuzzy.
conda install mypy isort
conda install flake8-black # => "black", "flake8",
pip install tensorflow 2.12.0 # "tf-slim >= 1.1.0"
conda install sentencepiece #>= 0.1.97
conda install scipy #>= 1.9.3
I used several commands vocab-stats, caption-stats, semantic-variance, concept-overlap to get metrics on small or medium Remote Sensing datasets with captions, that provided plausible results,
I tried concept-leave-one-out on dataset_RSITMD OK (23 715 captions), then on RSICD (24333 captions), and had the main (interactive) process killed unexpectedly during ,Evaluating... (which is very long) and I noticed many vdtk processes, spawned by this command :
$ vdtk concept-leave-one-out $METRICS/dataset_RSITMD_vdtk.json
Concept Set Leave-One-Out (Exact)
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Concept Set ┃ % Matches ┃ BLEU@1 ┃ BLEU@2 ┃ BLEU@3 ┃ BLEU@4 ┃ ROUGE-L ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Places365 │ 85.49% │ 0.82 +/- 0.13 │ 0.67 +/- 0.16 │ 0.57 +/- 0.19 │ 0.47 +/- 0.23 │ 0.69 +/- 0.14 │
│ MS-COCO │ 31.29% │ 0.75 +/- 0.14 │ 0.58 +/- 0.17 │ 0.47 +/- 0.20 │ 0.37 +/- 0.24 │ 0.60 +/- 0.15 │
│ ImageNet-1K │ 69.37% │ 0.77 +/- 0.14 │ 0.62 +/- 0.17 │ 0.51 +/- 0.20 │ 0.41 +/- 0.25 │ 0.65 +/- 0.15 │
│ Kinetics-400 │ 0.44% │ 0.63 +/- 0.16 │ 0.39 +/- 0.14 │ 0.24 +/- 0.15 │ 0.15 +/- 0.13 │ 0.44 +/- 0.10 │
│ Kinetics-600 │ 0.72% │ 0.59 +/- 0.16 │ 0.36 +/- 0.19 │ 0.24 +/- 0.19 │ 0.14 +/- 0.18 │ 0.43 +/- 0.13 │
└──────────────┴───────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┘
$ vdtk concept-leave-one-out $METRICS/dataset_rsicd_v2_vdtk.json
⠴ Evaluating... ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3% 0:00:34 0:06:43Killed
$ top
3846485 yves.ba+ 20 0 24,5g 1,4g 1072 S 0,3 1,1 10:05.57 vdtk
3846489 yves.ba+ 20 0 23,9g 813164 1952 S 0,3 0,6 6:53.97 vdtk
3846491 yves.ba+ 20 0 24,2g 1,1g 1248 S 0,3 0,9 8:42.48 vdtk
3846492 yves.ba+ 20 0 24,4g 1,4g 1984 S 0,3 1,1 8:55.44 vdtk
3846493 yves.ba+ 20 0 24,3g 1,2g 356 S 0,3 1,0 9:51.84 vdtk
3846495 yves.ba+ 20 0 24,2g 1,1g 1448 S 0,3 0,9 9:13.14 vdtk
I was surprised they didnot disapear. when I logged out of session in which I run the vdtk, and killed the jupyter-notebook in which I had also called vdtk function vocab_stats.
The sub-processes do not react to SIG_TERM, but are killed with SIGKILL (9)
$ ps -x
3846483 ? Sl 8:33 /opt/home/yves.bardout/anaconda3/envs/gis/bin/python3.9 /opt/home/yves.bardout/anaconda3/envs/gis/bin/vdtk concept-leave-one-out /opt/home/yves.bardout/espace_de_travail/wp2/metrics/tls_ac.66_captions_ITR_vdtk.json
...
$ ps -x|grep vdtk|wc -l
129
$ killall vdtk # SIG_TERM has no effect
$ killall -s 9 vdtk
$ ps -x|grep vdtk|wc -l
1
$ top
%Cpu(s): 0,1 us, 0,1 sy, 0,0 ni, 99,8 id,
$ vdtk concept-leave-one-out $METRICS/tls_ac.66_captions_ITR_vdtk.json
⠸ Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0:03:43 -:--:--
⠋ Evaluating... ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2% 0:06:36 0:00:40Killed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.