Code Monkey home page Code Monkey logo

human-scene-transformer's Introduction

๐Ÿ† Winner of the 2023 JRDB Trajectory Prediction Challenge - Reproduce our Result!

Human Scene Transformer

The (Human) Scene Transformer architecture (as described here and here) is a general and extendable trajectory prediction framework which threats trajectory prediction as a sequence to sequence problem and models it in a Transformer architecture.

It is straightforward to extend with

  • additional input features
  • custom environment encoder
  • different loss functions
  • ...

This is not an officially supported Google product.


Human Scene Transformer

Anticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation. Such spaces remain challenging as humans do not follow strict rules of motion and there are often multiple occluded entry points such as corners and doors that create opportunities for sudden encounters. In this work, we present a Transformer based architecture to predict human future trajectories in human-centric environments from input features including human positions, head orientations, and 3D skeletal keypoints from onboard in-the-wild sensory information. The resulting model captures the inherent uncertainty for future human trajectory prediction and achieves state-of-the-art performance on common prediction benchmarks and a human tracking dataset captured from a mobile robot adapted for the prediction task. Furthermore, we identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error in such challenging scenarios.

If you use this work please cite our paper

@article{salzmann2023hst,
  title={Robots That Can See: Leveraging Human Pose for Trajectory Prediction},
  author={Salzmann, Tim and Chiang, Lewis and Ryll, Markus and Sadigh, Dorsa and Parada, Carolina and Bewley, Alex}
  journal={IEEE Robotics and Automation Letters},
  title={Robots That Can See: Leveraging Human Pose for Trajectory Prediction},
  year={2023}, volume={8}, number={11}, pages={7090-7097},
  doi={10.1109/LRA.2023.3312035}
}

Prerequisites

Install requirements via pip install -r requirements.txt.

Please note that this codebase is not compatible with the Intel MKL backend for tensorflow. The MKL backend supports tensors up to 5 dimensions which is not sufficient for parts of this codebase. Should you have a MKL backed tensorflow installation or are running into MKL related errors, please disable the tensorflow MKL backend by setting the environment variable TF_ENABLE_ONEDNN_OPTS=0 and TF_DISABLE_MKL=1.

Data

JRDB

We provide a extensive prep-processing pipeline to convert the JRDB dataset, JRDB was created as a detection and tracking dataset rather than a prediction dataset. To make the data suitable for a prediction task, we first extract the robot motion from the raw sensor data to account for the robot's motion. Further, on the JRDB training split we combine algorithmic detection with the ground truth labels from the tracking dataset to create authentic tracks as input and labels for HST. Note that we do not purely use the ground truth hand labeled tracks in the JRDB train dataset as we find them to be overly smoothed giving away the future human movement. To adapt the JRDB dataset for prediction please follow this README.

Make sure to adapt <data_path> in config/<jrdb/pedestrians>/dataset_params.gin accordingly.

If you want to use the JRDB dataset for trajectory prediction in PyTorch we provide a PyTorch Dataset wrapper for the processed dataset.

Pedestrians ETH/UCY

Please download the raw data here.

Training

JRDB

python train.py --model_base_dir=./model/jrdb  --gin_files=./config/jrdb/training_params.gin --gin_files=./config/jrdb/model_params.gin --gin_files=./config/jrdb/dataset_params.gin --gin_files=./config/jrdb/metrics.gin --dataset=JRDB

Pedestrians ETH/UCY

python train.py --model_base_dir=./models/pedestrians_eth  --gin_files=..config/pedestrians/training_params.gin --gin_files=..config/pedestrians/model_params.gin --gin_files=./config/pedestrians/dataset_params.gin --gin_files=./config/pedestrians/metrics.gin --dataset=PEDESTRIANS

JRDB Trajectory Prediction Challenge Results

To reproduce our winning results in the 2023 JRDB Trajectory Prediction Challenge:

  • Make sure that you follow the data pre-processing instructions and pay special attention to where the instructions differentiate between the JRDB Challenge dataset and the original paper dataset.

  • Download the trained challenge model here

  • Run

python jrdb/eval_challenge.py --model_path=<path_to_challenge_model_folder> --checkpoint_path=<path_to_challenge_model_folder>/ckpts/ckpt-20 --dataset_path=<dataset_path> --output_path=<result_folder>

Evaluation

JRDB

python jrdb/eval.py --model_path=./models/jrdb/ --checkpoint_path=./models/jrdb/ckpts/ckpt-30

Keypoints Impact Evaluation

python jrdb/eval_keypoints.py --model_path=./models/jrdb/ --checkpoint_path=./models/jrdb/ckpts/ckpt-30

vs

python jrdb/eval_keypoints.py --model_path=./models/jrdb_no_keypoints/ --checkpoint_path=./models/jrdb_no_keypoints/ckpts/ckpt-30

Pedestrians ETH/UCY

python pedestrians/eval.py --model_path=./models/pedestrians_eth/ --checkpoint_path=./models/pedestrians_eth/ckpts/ckpt-20

Results

Compared to the published paper we improved our data processing and fixed small bugs in this code release. If you compare against our method please use the following updated results.

On the JRDB dataset with dataset options as set here:

AVG @ 1s @ 2s @ 3s @ 4s
MinADE 0.26 0.12 0.20 0.28 0.37
MinFDE 0.45 0.21 0.39 0.56 0.71
NLL -0.59 -0.90 -0.65 -0.08 0.32

On the ETH/UCY Pedestrians Dataset:

ETH Hotel Univ Zara1 Zara2 Avg
MinADE 0.41 0.10 0.24 0.17 0.14 0.21
MinFDE 0.73 0.14 0.44 0.30 0.24 0.37

JRDB Train / Test Split

The train / test split is implemented here.

Checkpoints

You can download trained model checkpoints for both JRDB and Pedestrians (ETH/UCY) datasets here.

To evaluate the pre-trained checkpoints you will have to adjust the path to the dataset in the respective params/operative_config.gin file.

Runtime

Evaluation of forward inference runtime with single output mode:

#Humans M1 - CPU A100 - GPU
1 40Hz 12Hz
10 30Hz 11Hz
20 23Hz 11Hz
50 12Hz 11Hz
100 5Hz 11Hz
150 11Hz

human-scene-transformer's People

Contributors

abewley avatar tim-salzmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

human-scene-transformer's Issues

tf Softmax input dimension error

System version: Gcloud debian 11
Cpu: C3 8vCPU
Memory: 64 GB

Software version1:

Python 3.9.2
numpy                         1.26.0
tensorflow                    2.14.0
open3d                        0.17.0
opencv-python-headless        4.8.1.78

Trace back:

Traceback (most recent call last):
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train.py", line 141, in <module>
    app.run(main)
  File "/home/47800/.local/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/47800/.local/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train.py", line 124, in main
    train_model.train_model(
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 252, in train_model
    train_step(train_iter)
  File "/home/47800/.local/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/47800/.local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.AbortedError: Graph execution error:

Detected at node while/body/_1/while/human_trajectory_scene_transformer/feature_attn_agent_encoder_learned_layer/multi_head_attention/softmax/Softmax defined at (most recent call last):
<stack traces unavailable>
Input dims must be <= 5 and >=1
         [[{{node while/body/_1/while/human_trajectory_scene_transformer/feature_attn_agent_encoder_learned_layer/multi_head_attention/softmax/Softmax}}]] [Op:__inference_train_step_67952]

When I try to run 'train.py', it happens. I think it might be caused by dataload or preprocessed data itself.
I am still trying to fix it......

Software version2

Python == 3.8
tensorflow == 2.13

the behavior changes.

ARNING:tensorflow:From /home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/jrdb/input_fn.py:555: load (from tensorflow.python.data.experimental.ops.io) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.load(...)` instead.
W1016 02:08:06.098888 140408121303680 deprecation.py:364] From /home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/jrdb/input_fn.py:555: load (from tensorflow.python.data.experimental.ops.io) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.load(...)` instead.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:3: Expected identifier, got: 4645555079012573616
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:1: Invalid control characters encountered in text.
......
......
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:337] Error parsing text-format tensorflow.data.experimental.DistributedSnapshotMetadata: 1:3: Expected identifier, got: 6459351093901513222
I1016 02:08:23.444825 140408121303680 train_model.py:151] Model created on device.
2023-10-16 02:08:23.611301: W tensorflow/core/framework/dataset.cc:956] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
I1016 02:08:24.023096 140408121303680 train_model.py:245] Beginning training.
Traceback (most recent call last):
  File "train.py", line 141, in <module>
    app.run(main)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "train.py", line 124, in main
    train_model.train_model(
  File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 252, in train_model
    train_step(train_iter)
  File "/home/47800/miniconda3/envs/hstpy38/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 93, in tf__train_step
    ag__.for_stmt(ag__.converted_call(ag__.ld(tf).range, (ag__.converted_call(ag__.ld(tf).constant, (ag__.ld(train_params).batches_per_train_step,), None, fscope),), None, fscope), None, loop_body_1, get_state_4, set_state_4, (), {'iterate_names': '_'})
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 91, in loop_body_1
    ag__.converted_call(ag__.ld(strategy).run, (ag__.ld(step_fn),), dict(args=(ag__.converted_call(ag__.ld(next), (ag__.ld(iterator),), None, fscope),), options=ag__.converted_call(ag__.ld(tf).distribute.RunOptions, (), dict(experimental_enable_dynamic_batch_size=False), fscope)), fscope)
  File "/tmp/__autograph_generated_fileq49vjt2f.py", line 21, in step_fn
    loss_dict = ag__.converted_call(ag__.ld(loss_obj), (ag__.ld(output_batch), ag__.ld(predictions)), None, fscope_1)
  File "/tmp/__autograph_generated_file4bviil8d.py", line 12, in tf____call__
    retval_ = ag__.converted_call(ag__.ld(self).call, (ag__.ld(input_batch), ag__.ld(predictions)), None, fscope)
  File "/tmp/__autograph_generated_filevzzr7lrl.py", line 13, in tf__call
    loss_dict = (ag__.ld(position_loss) | ag__.ld(mixture_loss))
TypeError: in user code:

    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/train_model.py", line 184, in step_fn  *
        loss_dict = loss_obj(output_batch, predictions)
    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/losses.py", line 37, in __call__  *
        return self.call(input_batch, predictions)
    File "/home/47800/SocialNavigation_v2/human-scene-transformer/human_scene_transformer/losses.py", line 456, in call  *
        loss_dict = position_loss | mixture_loss

    TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

Missing test dataset

Hello,
I am trying to test your model. However, when I downloading the test dataset, it automatically start downloading the file with training dataset. Should I go to JRDB website to request their dataset? And what dataset should I request?

Also, when I trying to run "python train.py", it is missing "dataset_spec.pb" file. And I could not find it anywhere within all the link you provided.

Thanks you

_MAX_DISTANCE_TO_ROBOT lost in utils.filter_agents_and_ground_from_point_cloud 283

There might be a little issue in the preprocessing process: human-scene-transformer/human_scene_transformer/data/utils.py

Original function:

def filter_agents_and_ground_from_point_cloud(
    agents_df, pointcloud_dict, robot_in_odometry_df):
  """Filter points which are in human bb or belong to ground."""
  for t, agent_df in agents_df.groupby('timestep'):
    pc_points = pointcloud_dict[t]
    robot_p = robot_in_odometry_df.loc[t]['p'][:2]
    dist_mask = np.linalg.norm(robot_p - pc_points[..., :2], axis=-1) < 10.
    pc_points = pc_points[
        (pc_points[:, -1] > -0.2) & (pc_points[:, -1] < 0.5) & dist_mask]
    for _, row in agent_df.iterrows():
      w, d = box_to_hyperplanes(
          row['p'], row['yaw'], 1.5*row['l'], 1.5*row['w'], row['h'])
      agent_pc_mask = np.all((pc_points @ w.T + d) > 0., axis=-1)
      pc_points = pc_points[~agent_pc_mask]
    np.random.shuffle(pc_points)
    pointcloud_dict[t] = pc_points
  return pointcloud_dict

The code sentence,

dist_mask = np.linalg.norm(robot_p - pc_points[..., :2], axis=-1) < 10.

should be like

dist_mask = np.linalg.norm(robot_p - pc_points[..., :2], axis=-1) < _MAX_DISTANCE_TO_ROBOT

So this function could be like

def filter_agents_and_ground_from_point_cloud(
    agents_df, pointcloud_dict, robot_in_odometry_df, max_distance_to_robot):
  """Filter points which are in human bb or belong to ground."""
  for t, agent_df in agents_df.groupby('timestep'):
    pc_points = pointcloud_dict[t]
    robot_p = robot_in_odometry_df.loc[t]['p'][:2]
    dist_mask = np.linalg.norm(robot_p - pc_points[..., :2], axis=-1) < max_distance_to_robot.
    pc_points = pc_points[
        (pc_points[:, -1] > -0.2) & (pc_points[:, -1] < 0.5) & dist_mask]
    for _, row in agent_df.iterrows():
      w, d = box_to_hyperplanes(
          row['p'], row['yaw'], 1.5*row['l'], 1.5*row['w'], row['h'])
      agent_pc_mask = np.all((pc_points @ w.T + d) > 0., axis=-1)
      pc_points = pc_points[~agent_pc_mask]
    np.random.shuffle(pc_points)
    pointcloud_dict[t] = pc_points
  return pointcloud_dict

filter_agents_and_ground_from_point_cloud(agents_df, pointcloud_dict, robot_in_odometry_df,
max_distance_to_robot=_MAX_DISTANCE_TO_ROBOT):

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.