Code Monkey home page Code Monkey logo

ha-vid's Introduction

HA-ViD: A Human Assembly Video Dataset for Comprehensive Assemby Knowledge Understanding

model

HA-ViD – the first human assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view, multi-modality videos (each video contains one assembly task), 1.5M frames, 96K temporal labels and 2M spatial labels.

Overview

This repository introduce the data structure of HA-ViD and the way to download it, and contains the codes for benchmarking algorithms of Action Recognition, Action Segmentation, Object Detection and Multi-Object Tracking.

The data is hosted in Dropbox. To download our data, please submit an access request from Our Website.

Folder structure of HA-ViD data

The data folder is:

data
├── HAViD_rgb
├── HAViD_depth
├── HAViD_skeleton
├── ActionRecognition
│   ├── mmskeleton
├── ActionSegmentation
│   ├── data
│   │   ├── features
│   │   ├── view0_lh_aa
│   │   │   ├── groundTruth
│   │   │   ├── splits
│   │   │   ├── mapping.txt
│   │   ├── view0_lh_pt
│   │   │   ├── ...
│   │   ├── ...
├── ObjectDetection
│   ├── train
│   │   ├── annotations
│   │   ├── images
│   ├── val
│   │   ├── annotations
│   │   ├── images
├── PretrainedCheckpoints
│   ├── ActionRecognition
│   │   ├── i3d_flow
│   │   ├── i3d_rgb
│   │   ├── mvit
│   │   ├── st_gcn
│   │   ├── timesformer
│   ├── ActionSegmentation
│   │   ├── BCN
│   │   │   ├── bcn_models
│   │   │   ├── bgm_models
│   │   ├── DTGRM
│   │   ├── ms-tcn
│   ├── ObjectDetection
│   │   ├── dino
│   │   ├── faster_rcnn_r50
│   │   ├── faster_rcnn_r101
│   │   ├── faster_rcnn_x101
│   │   ├── yolov5_l
│   │   ├── yolov5_s

data contains the raw data, including rgb videos, depth videos and skeleton data, the annotated data for three tasks, including Action Recognition, Action Segmentation, and Object Detection, and the checkpoints of the pretrained models for the three tasks.

Benchmark

We benchmark algorithms of four tasks, and the implementation details and code can be found in the subfolders:

Action Recognition

Action Segmentation

Object Detection

Multi-Object Tracking

Citation

If you find our code useful, please cite our paper.

@misc{zheng2023havid,
    title={HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding}, 
    author={Hao Zheng and Regina Lee and Yuqian Lu},
    year={2023},
    eprint={2307.05721},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

This work was supported by The University of Auckland FRDF New Staff Research Fund (No.3720540).

License

HA-ViD is licensed by us under the Creative Commons Attribution-NonCommerial 4.0 International License. The terms are :

  • Attribution : You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • NonCommercial : You may not use the material for commercial purposes.

Code of Conduct

Code of Conduct of HA-ViD can be found in Our Website.

ha-vid's People

Contributors

iai-hrc avatar haozheng-visioner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.