Code Monkey home page Code Monkey logo

Comments (10)

ykotseruba avatar ykotseruba commented on July 19, 2024 2

In the example above the first sample of 16 frames will start at 94 and end at 110 which is 60 frames TTE. The last sample starts at 125 and ends at 140 which is 30 frames TTE. In the lines 383-384 we already subtracted the observation length (16) frames to ensure that this is the case. This corresponds to the description in the paper.
In other words, we count TTE at the last frame of the observation, not the first frame. If we started sampling at sequence_length - 60 then the TTE would be 46, not 60.

from pedestrianactionbenchmark.

ykotseruba avatar ykotseruba commented on July 19, 2024

Hi Xingchen,
Thank you for your interest in our work.
To answer your question, from each pedestrian track we generate multiple observations of 16 frames (default observation length) within 1-2s time-to-event (TTE). The overlap parameter controls the step of the sliding window. At 0 overlap the samples will start at frames 0, 16, 32,... and so on, i.e. every observation sample starts after the previous one ends. At maximum overlap of 1, the samples will be collected starting at every frame.
Therefore, the purpose of overlap is to increase the amount of training data. For smaller dataset such as JAAD we use a higher overlap of 0.8 to get a comparable number of training samples as generated from the PIE dataset with 0.6 overlap. In our implementation, function action_predict.py:get_data_sequence() is where observation samples are extracted from pedestrian tracks.

from pedestrianactionbenchmark.

xingchenzhang avatar xingchenzhang commented on July 19, 2024

Hi,

Thank you so much for your quick and detailed reply! You are very kind.

I think now I know what overlap means and how you handle the training data.

If I understand correctly, you used the training data in this way:

  1. Each pedestrian track you mentioned contains an event (cross or not) and 30 frames (1-2s before this event), say from frame 0 to frame 29.
  2. You apply sliding window (controlled by 'overlap') within this 30 frames window prior the event.
  3. If overlap is 1, then the samples will be collected starting at every frame. So we can generate 15 observations of 16 frames. (frame 0-15, 1-16, 2-17, 3-18, 4-19, 5-20, 6-21, 7-22, 8-23, 9-24, 10-25, 11-26, 12-27, 13-28, 14-39). These 15 observations have the same label: C or NC.

Could you please kindly let me know if I understand correctly?

Thank you very much again for your nice work!

Bests,
Xingchen

from pedestrianactionbenchmark.

ykotseruba avatar ykotseruba commented on July 19, 2024

Hi Xingchen,
You are welcome and yes, your understanding is correct.
Yulia

from pedestrianactionbenchmark.

xingchenzhang avatar xingchenzhang commented on July 19, 2024

Hi Yulia,

Thank you very much for your reply and confirmation!

Very nice work! Hope I can develop a new method using your data.

Bests,
Xingchen

from pedestrianactionbenchmark.

xingchenzhang avatar xingchenzhang commented on July 19, 2024

Hi Yulia,

Sorry, I just read your paper again. I found maybe I made a mistask in my previous reply.

Actually, for each pedestrian track you have 76 frames (16 for observation and 60 for TTE). You actually apply the sliding window on the first 46 frames rather than 30 frames, right? In my previous reply, I mentioned that you applied sliding window on the 30 frames window (1-2 seconds) because I forget the observation period.

Could you let me know if now I understand correctly?

Thanks a lot!
Xingchen

from pedestrianactionbenchmark.

ykotseruba avatar ykotseruba commented on July 19, 2024

Hi Xingchen,
If the observation is between 1-2s TTE, we start observing 60 frames before the event and stop at 30 before the event. In this case, we apply a sliding window within the 30 frame range. If a single TTE instead of a range is set, then only 16 frames ending at that TTE are collected.

Please take a look at the function action_predict.py:get_data_sequence(). Pedestrian tracks stored in dictionary d are already cropped so that they end at the event (crossing or not crossing).
Lines 383-384 show how the first and last index of observation is computed.

For example, the track is 170 frames, i.e. the event happens 170 frames after the pedestrian appears on screen.
start_idx = track length - observation length - 60 = 170-16-60 = 94
end_idx = track length - observation length - 30 = 170-16-30 = 125
The 16 frame segments are sampled within 30 frame range starting at start_idx and ending at end_idx+1 with the step determined by the overlap parameter (at 0.8 it is every 3 frames, or every frame when overlap is 1).
Hope this clarifies your question.
Yulia

from pedestrianactionbenchmark.

xingchenzhang avatar xingchenzhang commented on July 19, 2024

Hi Yulia,

I apprecitate your detailed clarification very much!

Now I know what you mean. I previously thought in this case, the end_idx is 170-30 = 140. This is why I said the sliding window was applied to 46 frames window.

By the way, I am just curious why you guys did not use an end_idx 140 in this case. In your paper (on page 3) you said 'so the last frame of observation is between 1 and 2s (or 30-60 frames) prior to the crossing event start'. If you use 125 as the end_idx, so the last frame of observation is actually between 46-60 frames prior to the crossing event start.

Anyway, I am just curious about this. Maybe this is more practical in real case.

Many thanks again for your help!

Bests,
Xingchen

from pedestrianactionbenchmark.

xingchenzhang avatar xingchenzhang commented on July 19, 2024

Thank you very much Yulia! Now it is very clear!

from pedestrianactionbenchmark.

ykotseruba avatar ykotseruba commented on July 19, 2024

You are welcome :)

from pedestrianactionbenchmark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.