Code Monkey home page Code Monkey logo

this-is-my's Introduction

This-Is-My Dataset

As one of contributions in Meta-Personalizing Vision-Language Models To Find Named Instances in Video (CVPR 2023)

Chun-Hsiao Yeh, Bryan Russell, Josef Sivic, Fabian Caba Heilbron, Simon Jenni
UC Berkeley, CIIRC CTU, Adobe Research
In CVPR 2023

image

Examples from This-Is-My: Meta-Personalization D (top) vs Test-time personalization P (bottom-left) vs Query-time Q (bottom-right) datasets. In the Query-time dataset (bottom-right), we design a challenging video instance retrieval task. For example, the named instance (i.e., Alex's piano) is in the background and is barely visible, and for "Zak's dog Coffee", the background scenes in the query-time dataset (bottom-right) are completely different from the test-time personalization dataset (bottom-left) depicting the same named instance.

Dataset Overview

In This-Is-My dataset, we provide video segments and original videos for both the training and evaluation sets, along with annotated segments and captions for contextualized retrieval evaluation. The dataset structure is as follows:

<THISISMY_ROOT>/
    ├── train_segment/
    │   └── <SEGMENT_ID>.mp4, ...
    ├── eval_segment/
    │   └── <SEGMENT_ID>.mp4, ...
    ├── train_video/
    │   └── <{VIDEO_ID}_{VIDEO_NAME}>.mp4, ...
    ├── eval_video/
    │   └── <{VIDEO_ID}_{VIDEO_NAME}>.mp4, ...
    │
    └── this-is-my-dataset/
        ├── <SEGMENT>.csv
        ├── <TEST-SET>.json
        └── <EVAL-CAPTIONS>.csv

Get Started

To get started, we recommend creating a conda environment and installing the required packages using the following commands:

conda create --name this-is-my python=3.7
conda activate this-is-my
conda install pytorch==1.7.0 torchvision torchaudio cudatoolkit=11.0 -c pytorch

# packages for downloading video segments
conda install -c conda-forge pytube
conda install -c conda-forge moviepy
conda install -c anaconda pandas

How to Download the This-Is-My Dataset?

We have provided a simple script to download the dataset from scratch. Run the following command to download the video segments and original videos:

python download_video.py --MODE 'train'

This script will create two folders: train_segment, which contains the video segments of named instances, and train_video, which contains the original videos.

Note that you can replace 'train' with 'eval' to download video segments for evaluation as well.

Retrieving This-Is-My Metadata

python thisismy_dataset.py

We can retrieve the metadata of dataset by load_thisismy(ANNO_FILE,SEGMENT_FILE). The returned variables contain the following information:

  • train_x, eval_x: Arrays that include segment IDs for the train and evaluation data splits (e.g., ead408e4-e1b6-4256-9adf-043906a41170)

  • train_y, eval_y: Arrays that include token IDs (e.g., 0) for each segment. The token IDs can be mapped to instances using the token2item dictionary (e.g., {0: "Casey's friend marlan"})

  • train_class, eval_class: array that includes category IDs (e.g., 7) for each segment. The category IDs could be mapped to category name by id2classname dictionary (e.g., {7: 'man', 8: 'piano'})

  • token2class: A dictionary that provides a hierarchical mapping between token IDs and category IDs. (e.g., {0: 7, 1: 7, 2: 10, 3: 0})

We can also retrieve annotated data of eval captions by load_this_is_my_captions(CAPTIONS_FILE), some returned variables are:

  • captions: Annotated captions that describe the concept in the segment. (e.g., * is standing at the intersection)

  • class_names: The class of the named instance (e.g., man)

How to Get Support?

If you have any general questions or need support, please feel free to contact: Chun-Hsiao Yeh, Simon Jenni, and Fabian Caba. Also, we encourage you to open an issue in the GitHub repository. By doing so, you not only receive support but also contribute to the collective knowledge base for others who may have similar inquiries.

Citation

If you find the This-Is-My dataset valuable and utilize it in your work, we kindly request that you consider giving our GitHub repository a ⭐ and citing our paper.

@inproceedings{yeh2023meta,
  title={Meta-Personalizing Vision-Language Models To Find Named Instances in Video},
  author={Yeh, Chun-Hsiao and Russell, Bryan and Sivic, Josef and Heilbron, Fabian Caba and Jenni, Simon},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19123--19132},
  year={2023}
}

this-is-my's People

Contributors

danielchyeh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

louisyen

this-is-my's Issues

Upload datasets to Hugging Face

Hi!

Very cool work! I found it in https://huggingface.co/papers/2306.10169. It would be nice to have the dataset on the Hugging Face Hub to make it easier to use.

Some of the benefits of sharing your models through the Hub would be:

  • versioning, commit history and diffs
  • repos provide useful metadata about their tasks, metrics, etc that make them discoverable
  • wider reach of your work to the ecosystem

Creating the repos and adding new datasets should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.

Documentation

From the README/file structure, it is hard to tell which data corresponds to "meta-personalisation" dataset, which one to "test-time personalisation" (low-shot train, I guess), and which one to query dataset.

It would be helpful if there was a description as to what matches with what specifically.

Issues in downloading some videos

Dear authors,

Great work!

While downloading the videos, I saw that some videos fail to download because they are age-restricted. Do you have a workaround for this?

pytube.exceptions.AgeRestrictedError: _a2GpAq8q8g is age restricted, and can't be accessed without logging in.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.