Code Monkey home page Code Monkey logo

vdtk's People

Contributors

davidmchan avatar xk-huang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

vdtk's Issues

More diverse formats of media inputs

Hi David!
Thanks for the wonderful library for VL evaluation!

Here is a potential improvement I want to discuss with you. It is about expanding the format of media inputs.
Presently, in clip_recall.py the images are loaded from local path. However, there are use cases where images are not stored in single files but in a huge tar package.
I think it could be better to support binary inputs. If so, there are multiple design choices here. e.g., store binary string in the “media_path”, or add an additional field called “media_binary”, etc. I choose the later so far xk-huang@afbd47a
I would certainly like to hear your ideas about it.

Besides, I cannot pull all the files stored with git-lfs due to the insufficient quota in the repo. Thus, the tests cannot be thoroughly run.

Thanks in advance!

robustness issues on commands: ngram_stats , coreset , qualitative_sample , clip_recall & content_recall

I found VDTK quite usefull as implementation all semantic metrics at the dataset level, and used it to compare my dataset for Remote Sensing with previous, as RSICD and RSITMD. However there are a few robustess issues in VDTK. Often a simple test would allow to continue computing with partial results. I have only attempted correction for some of these. Did somebody also encounter issues and modified this code ?

Environment

$uname -a
Linux minds01.irtse-pf.ext 5.11.0-41-generic #45~20.04.1-Ubuntu SMP Wed Nov 10 10:20:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# conda install follows : vdtk_min.yaml

name: geo2
channels:
  - conda-forge
  - defaults

dependencies:
- pip=21.2.4
- pip:
  - flake8-black>=0.3.6
  - tensorflow==2.12.0
  - sentence-transformers==2.2.2
  - bert-score==0.3.13
  - spacy==3.5.2
  - vdtk==0.3.0
  - rich==13.3.4
  - python-levenshtein

Observations

I coded a simple converter to COCO format from the format used in RSICD , RSITMD and my own set.
These datasets are attached in zip, directly usable with vdtk.

  • ngram_stats fails on RSITMD in env geo2: _ ngram_stats() got an unexpected keyword argument 'reference_key'_ click/core.py line 760, in invoke return __callback(*args, **kwargs)

  • coreset fails all sets: division by zero_ core_set.py line 114, in coreset
    table.add_row(f"{f:.2f}", str(s), f"{s * 100 /len(test_data):.2f}%")

  • sample: on geotruth (OK RSICD, RSITMD) max() arg is an empty sequence geotruth: qualitative_sample.py line 86, in qualitative_sample best_bleu_mean_caption = max(mabs.items(), key=lambda x: x[1])`

  • clip_recall on RSICD: No such file or directory: 'airport_1.jpg' in clip_recall.py line 36, in _get_feature Image.open(media_path) . Also on RSITMD for 'baseballfield_452.tif', geotruth for 'tile_535000-6245200.tif'

    • workaround: cd to the image directory for this set OK
  • clip_recall on RSICD : zero-size array to reduction operation maximum which has no identity inside clip_recall.py line 163, np.amax(i) raised by numpy.

  • Content Recall : all nan . Why ?

                                            Content Recall
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Dataset                    ┃ Noun Recall ┃ Verb Recall ┃ Noun Recall (Fuzzy) ┃ Verb Recall (Fuzzy) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ dataset_rsicd_v2_vdtk.json │   nan ± nan │   nan ± nan │           nan ± nan │           nan ± nan │
└────────────────────────────┴─────────────┴─────────────┴─────────────────────┴─────────────────────┘

References:

zombie processes of vdtk

Thanks for this useful toolkit. I noticed a reproductible issue with concept-leave-one-out, killed after 6 minutes (attempted 3 times). The first, I was unsure if the server was too loaded and/or dataset a bit large. I retried this command with no load at all.
It appears concept-leave-one-out does not work properly on 2 datasets, on which other commands worked, and there is no message or easy check.

env

I installed the package in a conda env, Python 3.9.13 on a 20.04.1-Ubuntu SMP 64 cores.
The 'pip install vdtk' didnot trigger dependencies. So I manually installed those per pyproject.toml:
PRESENT in conda list: "pandas >= 1.5.1", "pytest", "nltk >= 3.6.5", "numpy >= 1.21.4", "matplotlib >= 3.5.0", "fuzzywuzzy >= 0.18.0", "fuzzysearch >= 0.7.3", "jdk4py >= 17.0.3.0", "rich >= 10.14.0", "mpire >= 2.3.1", "click >= 8.0.3", "embeddings >= 0.0.8", "mauve-text >= 0.3.0", "regex >= 2022.10.31", "rouge-score >= 0.1.2", "tqdm >= 4.62.3",
in pip list: "POT >= 0.8", "ftfy >= 6.1.1", "sentence-transformers >= 2.1.0", "bert-score >= 0.3.12", "spacy >= 3.2.0",

python3 -m spacy download en
pip install levenshtein # remove _Warning: Using slow pure-python SequenceMatcher_ in fuzzywuzzy.
conda install mypy  isort
conda install  flake8-black # =>  "black",  "flake8",
pip install  tensorflow 2.12.0 #  "tf-slim >= 1.1.0"
conda install  sentencepiece   #>= 0.1.97
conda install  scipy  #>= 1.9.3

run

I used several commands vocab-stats, caption-stats, semantic-variance, concept-overlap to get metrics on small or medium Remote Sensing datasets with captions, that provided plausible results,

I tried concept-leave-one-out on dataset_RSITMD OK (23 715 captions), then on RSICD (24333 captions), and had the main (interactive) process killed unexpectedly during ,Evaluating... (which is very long) and I noticed many vdtk processes, spawned by this command :

$ vdtk concept-leave-one-out $METRICS/dataset_RSITMD_vdtk.json

Concept Set Leave-One-Out (Exact)
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Concept Set ┃ % Matches ┃ BLEU@1 ┃ BLEU@2 ┃ BLEU@3 ┃ BLEU@4 ┃ ROUGE-L ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Places365 │ 85.49% │ 0.82 +/- 0.13 │ 0.67 +/- 0.16 │ 0.57 +/- 0.19 │ 0.47 +/- 0.23 │ 0.69 +/- 0.14 │
│ MS-COCO │ 31.29% │ 0.75 +/- 0.14 │ 0.58 +/- 0.17 │ 0.47 +/- 0.20 │ 0.37 +/- 0.24 │ 0.60 +/- 0.15 │
│ ImageNet-1K │ 69.37% │ 0.77 +/- 0.14 │ 0.62 +/- 0.17 │ 0.51 +/- 0.20 │ 0.41 +/- 0.25 │ 0.65 +/- 0.15 │
│ Kinetics-400 │ 0.44% │ 0.63 +/- 0.16 │ 0.39 +/- 0.14 │ 0.24 +/- 0.15 │ 0.15 +/- 0.13 │ 0.44 +/- 0.10 │
│ Kinetics-600 │ 0.72% │ 0.59 +/- 0.16 │ 0.36 +/- 0.19 │ 0.24 +/- 0.19 │ 0.14 +/- 0.18 │ 0.43 +/- 0.13 │
└──────────────┴───────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┘

$ vdtk  concept-leave-one-out   $METRICS/dataset_rsicd_v2_vdtk.json
⠴ Evaluating... ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3% 0:00:34 0:06:43Killed

$ top
3846485 yves.ba+  20   0   24,5g   1,4g   1072 S   0,3   1,1  10:05.57 vdtk
3846489 yves.ba+  20   0   23,9g 813164   1952 S   0,3   0,6   6:53.97 vdtk
3846491 yves.ba+  20   0   24,2g   1,1g   1248 S   0,3   0,9   8:42.48 vdtk
3846492 yves.ba+  20   0   24,4g   1,4g   1984 S   0,3   1,1   8:55.44 vdtk
3846493 yves.ba+  20   0   24,3g   1,2g    356 S   0,3   1,0   9:51.84 vdtk
3846495 yves.ba+  20   0   24,2g   1,1g   1448 S   0,3   0,9   9:13.14 vdtk

I was surprised they didnot disapear. when I logged out of session in which I run the vdtk, and killed the jupyter-notebook in which I had also called vdtk function vocab_stats.

partial solution

The sub-processes do not react to SIG_TERM, but are killed with SIGKILL (9)

$ ps -x
3846483 ?        Sl     8:33 /opt/home/yves.bardout/anaconda3/envs/gis/bin/python3.9 /opt/home/yves.bardout/anaconda3/envs/gis/bin/vdtk concept-leave-one-out /opt/home/yves.bardout/espace_de_travail/wp2/metrics/tls_ac.66_captions_ITR_vdtk.json
...
$ ps -x|grep vdtk|wc -l
129

$ killall vdtk        # SIG_TERM has no effect 
$ killall -s 9 vdtk
$ ps -x|grep vdtk|wc -l
1
$ top
%Cpu(s):  0,1 us,  0,1 sy,  0,0 ni, 99,8 id, 
$ vdtk  concept-leave-one-out   $METRICS/tls_ac.66_captions_ITR_vdtk.json
⠸ Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:03:43 -:--:--

⠋ Evaluating... ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   2% 0:06:36 0:00:40Killed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.