Code Monkey home page Code Monkey logo

jaad's Introduction

JAAD 2.0: Annotations and python interface

jaad_samples



This repository contains new annotations for the Joint Attention in Autonomous Driving (JAAD) dataset. The annotations are in XML format and can be used with a newly introduced python interface. The original annotations can be found here.

Download video clips: YorkU server Google Drive

Table of contents

Annotations

JAAD annotations are organized according to video clip names. There are three types of labels, pedestrians (samples with behavior annnotations), peds (bystanders that are far away and do not interact with the driver) and people (groups of pedestrians). Each pedestrian has a unique id in the form of 0_<video_id>_< pedestrian_number>. Pedestrians with behavior annotations have a letter 'b' at the end of their id, e.g. 0_1_3b. The annotations for people also follow the same pattern with the exception of ending with letter 'p', e.g. 0_5_2p.

All samples are annotated with bounding boxes using two-point coordinates (top-left, bottom-right) [x1, y1, x2, y2]. The bounding boxes have corresponding occlusion tags. The occlusion values are either 0 (no occlusion), 1 (partial occlusion >25%) or 2 (full occlusion >75%).

According to their types, the annotations are divided into 5 groups:
Annotations: These include video attributes (time of day, weather, location), pedestrian bounding box coordinates, occlusion information and activities (e.g. walking, looking). The activities are provided only for a subset of pedestrians. These annotations are one per frame per label.
Attributes (pedestrians with behavior annotations only): These include information regarding pedestrians' demographics, crossing point, crossing characteristics, etc. These annotations are one per pedestrian.
Appearance (videos with high visibility only): These include information regarding pedestrian appearance such as pose, clothing, objects carreid (see _get_ped_appearance() for more details). These annotations are one per frame per pedestrian.
Traffic: These provide information about traffic, e.g. signs, traffic light, for each frame. These annotations are one per frame.
Vehicle: These are vehicle actions, e.g. moving fast, speeding up, per each frame.

Video clips

JAAD contains 346 video clips. These clips should be downloaded and placed in JAAD_clips folder as follows:

JAAD_clips/video_0001.mp4
JAAD_clips/video_0002.mp4
...

To download the videos, either run script download_clips.sh or manually download the clips from here and extract the zip archive.

Interface

Dependencies

The interface is written and tested using python 3.5. The interface also requires the following external libraries:

  • opencv-python
  • numpy
  • scikit-learn

Extracting images

In order to use the data, first, the video clips should be converted into images. This can be done using script split_clips_to_frames.sh or via interface as follows:

from jaad_data import JAAD
jaad_path = <path_to_the_dataset_root_folder>
imdb = JAAD(data_path=jaad_path)
imdb.extract_and_save_images()

Using either of the methods will create a folder called images and save the extracted images grouped by corresponding video ids in the folder.

images/video_0001/
				00000.png
				00001.png
				...
images/video_0002/
				00000.png
				00001.png
				...		
...

Using the interface

Upon using any methods to extract data, the interface first generates a database (by calling generate_database()) of all annotations in the form of a dictionary and saves it as a .pkl file in the cache directory (the default path is JAAD/data_cache). For more details regarding the structure of the database dictionary see comments in the jaad_data.py for function generate_database().

Parameters

The interface has the following configuration parameters:

data_opts = {'fstride': 1,
             'sample_type': 'all',  
	     'subset': 'high_visibility',
             'data_split_type': 'default',
             'seq_type': 'trajectory',
	     'height_rng': [0, float('inf')],
	     'squarify_ratio': 0,
             'min_track_size': 0,
             'random_params': {'ratios': None,
                               'val_data': True,
                               'regen_data': True},
             'kfold_params': {'num_folds': 5, 'fold': 1}}

'fstride'. This is used for sequence data. The stride specifies the sampling resolution, i.e. every nth frame is used for processing.
'sample_type'. This method specifies whether to extract all the pedestrians or only the ones with behavior data (beh).
'subset'. Specifies which subset of videos to use based on degree of visibility and resolution. 'data_split_type'. The JAAD data can be split into train/test or val in three different ways. default uses the predefined train/val/test split specified in .txt files in split_ids folder. random randomly divides pedestrian ids into train/test (or val) subsets depending on random_params (see method _get_random_pedestrian_ids() for more information). kfold divides the data into k sets for cross-validation depending on kfold_params (see method _get_kfold_pedestrian_ids() for more information).
'seq_type'. Type of sequence data to generate (see Sequence analysis). 'height_rng'. These parameters specify the range of pedestrian scales (in pixels) to be used. For example height_rng': [10, 50] only uses pedestrians within the range of 10 to 50 pixels in height.
'squarify_ratio'. This parameter can be used to fix the aspect ratio (width/height) of bounding boxes. 0 the original bounding boxes are returned.
'min_track_size'. The minimum allowable sequence length in frames. Shorter sequences will not be used.

Sequence analysis

There are three built-in sequence data generators accessed via generate_data_trajectory_sequence(). The type of sequences generated are trajectory, intention and crossing. To create a custom data generator, follow a similar structure and add a function call to generate_data_trajectory_sequence() in the interface.

Detection

The interface has a method called get_detection_data() which can be used to generate detection data. Currently, there are four build-in methods specified which either return data or produce and save data lists for models (see get_detection_data() for more information).

Citation

If you use our dataset, please cite:

@inproceedings{rasouli2017they,
  title={Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior},
  author={Rasouli, Amir and Kotseruba, Iuliia and Tsotsos, John K},
  booktitle={ICCVW},
  pages={206--213},
  year={2017}
}

@inproceedings{rasouli2018role,
  title={It’s Not All About Size: On the Role of Data Properties in Pedestrian Detection},
  author={Rasouli, Amir and Kotseruba, Iuliia and Tsotsos, John K},
  booktitle={ECCVW},
  year={2018}
}

Authors

Please send email to [email protected] or [email protected] if there are any problems with downloading or using the data.

License

This project is licensed under the MIT License - see the LICENSE file for details.

The video clips are licensed under Creative Commons Attribution 4.0 International License.

jaad's People

Contributors

aras62 avatar hibetterheyj avatar ykotseruba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

jaad's Issues

Clarity on action and reaction

For the video_0001
From annotaion i see for frame 363
hand_gesture

and for frame 364
reaction
action
hand_gesture

I am not able to locate what is this "reaction action" in the readme. Could you please elaborate on this. How these lables would be interesting, while predicting the Intention. I would like to use LSTM on the sequence of activities, but not sure about the activites such as "reaction action".

from jaad_data import JAAD
imdb = JAAD(data_path = jaad_path)
anno = imdb._get_annotations("video_0001")
for k,v in anno["ped_annotations"]["0_1_3b"]["behavior"].items():
print (k)

Outputs:
cross
reaction
look
nod
action
hand_gesture

I was expecting "walking" in this output. What are the possible values apart from above output.

Pedestrian tracking error

In video number 102 there is a pedestrian crossing the street, but two pedestrians appear on the labels, so it is a tracking error, perhaps because it disappears and reappears in the scene (in the visible field of the camera)
PS: Labels on behavior data set are almost correct, change start_frame, from 134 to 137

Not able to use .mat file

Tries with matlabR2013a and R2016a versions of matlab. I am getting below errors.
Which version of matlab was used for this?

load('video_0001.mat')
Error using load
Unable to read MAT-file C:\Users\kara9147\Documents\MATLAB\video_0001.mat: not
a binary MAT-file.
Try LOAD -ASCII to read as text.

Thanks,
Kalinga

Purpose of new python files

Hallo, recently two new python files have been added, and i could not find any readme about it.
jaad_data.py and jaad_eval.py why and how to use them.

Is there a script available that converts the .vvb and .mat information to a JSON, because i find once the information related to bounding boxes are stored in JSON, it is easier to work with in python.

Thnaks,
Kalinga

Normalization in pose data?

Hi,

Many thanks for your very useful work!

Can I ask have you performed normalization for the pose data? I have observed some pose data and found the range is between 0 and 1. So I am wondering this.

Thanks a lot again!

Bests,
Xingchen

data format related problems

when I loaded video_0002, I met some problem in information of pedestrian1. The number of pose direction(158) in appearance is not the same as number of frames in ped_annotations(192), so how can we get the correspondence between the label of pose direction and ped_annotations, in other words, what is the corresponding frame index of pose direction ?

About pedestrian '0_5_12b'

Hi,

Many thanks for your great work!

I have some questions about the annotation of pedestrian '0_5_12b', could you please kindly help to clarify?

From video_0005_attributes.xml, I know 0_5_12b is a female adult (the lady with a child).

In the video, this lady exists from frame 00000.png to 00158.png (she is fully occluded by a car from frame 00159). From around frame 00120.png she has occlusion.

However, in video_0005.xml, she exists from frame 12 to frame 206. Also, this file says she has occlusion from frame 12 to frame 77, and has no occlusion from frame 78 to 166, and then has part occlusion from frame 167 to frame 204.

So I think the image and annotations about '0_5_12_b' do not match.

Furthermore, in your latest benchmark work that provides pose data, pedestrian 'o_5_12b' still has pose data from frame 00158.png to frame 00173.png, but actually she is fully occluded from frame 00159.png.

I am not sure if I made any mistake here...Could you please check if you have sometime? That would be very helpful!

Many thanks for your help!

Bests,
Xingchen

Any reason behind making intention = 0 when sample_type = 'all'?

Hi,
Thanks a lot for this amazing work. I have recently started working with JAAD data and had confusion on this line(

JAAD/jaad_data.py

Line 1292 in a958453

intent = [[0]] * len(boxes)
). Could you please let me know the reason behind making intention as 0 for all the pedestrians when the sample_type = 'all'. According to my understanding, there should still be pedestrians having an intention of 1, even is sample_type='all'. Please correct me if I am wrong.

Thanks!

Sequence analysis for frames in a video

Hello, I want to sequence analysis for the videos using predictions made by the neural network. I would like to know the error rate of the pedestrians. Please help me how to do that.

Pedestrian behavior annotations do not match in xmls

Hi,

I'm facing some errors while trying to generate the trajectory sequences using the pedestrian behaviour annotations (branch JAAD_2.0, commit be57a06).
The error can be reproduced with the following commands:

from jaad_data import JAAD
jaad_path = '/home/vito/workspace/JAAD/'
imdb = JAAD(data_path=jaad_path)
seq = imdb.generate_data_trajectory_sequence(image_set='train', seq_type='trajectory', sample_type='beh')

and here is the output:

---------------------------------------------------------                 
Generating action sequence data   
data_split_type: default          
height_rng: [0, inf]                                                      
squarify_ratio: 0                                                
kfold_params: {'num_folds': 5, 'fold': 1}                                 
subset: default                   
seq_type: trajectory                 
random_params: {'regen_data': False, 'val_data': True, 'ratios': None}    
fstride: 1                                                                
sample_type: beh                                                          
min_track_size: 15                                                        
---------------------------------------------------------
Generating database for jaad      
Getting annotations for video_0001                                        
Getting annotations for video_0002                                        
Getting annotations for video_0003                               
...
Getting annotations for video_0346
The database is written to /home/vito/workspace/JAAD/data_cache/jaad_database.pkl
---------------------------------------------------------        
Generating trajectory data
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vito/workspace/JAAD/jaad_data.py", line 1031, in generate_data_trajectory_sequence
    sequence = self._get_trajectories(image_set, annot_database, **params)
  File "/home/vito/workspace/JAAD/jaad_data.py", line 1097, in _get_trajectories
    if annotations[vid]['ped_annotations'][pid]['attributes']['crossing'] == -1:
KeyError: 'crossing'
>>>
>>> import pdb
>>> pdb.pm()
> /home/vito/workspace/JAAD/jaad_data.py(1097)_get_trajectories()
-> if annotations[vid]['ped_annotations'][pid]['attributes']['crossing'] == -1:
(Pdb) list
1092                    ped_ids = [[pid]] * len(boxes)
1093
1094                    if params['sample_type'] == 'all':
1095                        intent = [[0]] * len(boxes)
1096                    else:
1097 ->                     if annotations[vid]['ped_annotations'][pid]['attributes']['crossing'] == -1:
1098                            intent = [[0]] * len(boxes)
1099                        else:
1100                            intent = [[1]] * len(boxes)
1101                    center = [self._get_center(b) for b in boxes]
1102
(Pdb) vid
'video_0001'
(Pdb) pid
'0_1_1b'
(Pdb)

It looks like annotations/ and annotations_attributes/ do not match.
For example, annotations_attributes/video_0001_attributes.xml lists two pedestrians: "0_1_2b" (old_id="pedestrian2") and "0_1_3b" (old_id="pedestrian1"). However annotations/video_0001.xml includes only two pedestrians with behavior: "0_1_1b" (old_id="pedestrian1") and "0_1_3b" (old_id="pedestrian2").

I'm afraid video_0001 is not the only mismatch... from the repo hisory I guess something must have happened with commit 3f19315. If I checkout the previous commit (0e819c6) the above python commands complete succesfully.

Could you please check if everything is ok with the annotations or tell me if I am using the interface in the wrong way?

Thanks!
Vito

Using the interface to extract frames

Hi. Thank you for your work. I'm trying to use the interface to extract the frames from the videos, which I placed in JAAD\JAAD_clips. I'm giving the root folder (\JAAD) as the data path, but the images are not being extracted to the images folder. The images folder and sub-folders are all created normally but the frames are saved in the root folder. And only the frames for video 1 are saved, while the rest aren't.
Could this be a Windows issue?

Is it possible to add pedestrian_unique_id on retinanet method results?

Thank you so much for providing such a wonderful dataset.
I have a question. Is it possible to add pedestrian_unique_id on retinanet method as in the CSV results? Because, it just generates frames_id, bounding box, and pedestrian class without generating each pedestrian's unique ID.
If possible, may you teach or share here how to do it? Thank you, I would like to appreciate any help you can provide.

Vehicle odometry

Hi,
Do you provide vehicle odometry information for each frame as well?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.