opendilab / lmdrive Goto Github PK

[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models

License: Apache License 2.0

Python 24.63% Makefile 0.01% Shell 0.09% Batchfile 0.01% Jupyter Notebook 74.72% Ruby 0.01% HTML 0.09% SCSS 0.01% JavaScript 0.01% Dockerfile 0.01% CSS 0.01% XSLT 0.44%

lmdrive's Introduction

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

An end-to-end, closed-loop, language-based autonomous driving framework, which interacts with the dynamic environment via multi-modal multi-view sensor data and natural language instructions.

[Project Page] [Paper] [Dataset(hugging face)] [Model Zoo]

[Dataset(OpenXlab)] [Model Zoo(OpenXLab)]

News

[02/27] LMDrive is accepted by CVPR 2024 🎉🎉🎉
[01/25] We uploaded our models to OpenXLab
[01/23] We gave a talk at ZhiDongXi (智东西)
[01/20] We uploaded our dataset to OpenXLab
[12/21] We released our project website here

Hao Shao, Yuxuan Hu, Letian Wang, Steven L. Waslander, Yu Liu, Hongsheng Li.

This repository contains code for the paper LMDrive: Closed-Loop End-to-End Driving with Large Language Models. This work proposes a novel language-guided, end-to-end, closed-loop autonomous driving framework.

Demo Video

demo_video.mp4

Setup
Model Weights
Dataset
Training
1. Vision encoder pre-training
2. Instruction finetuning
Evaluation
Citation
Acknowledgements

Setup

Our project is built on three parts: (1) vision encoder (corresponding repo: timm); (2) vision LLM (corresponding repo: LAVIS); (3) data collection, agent controller (corresponding repo: InterFuser, Leaderboard, ScenarioRunner).

Install anaconda

wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
bash Anaconda3-2020.11-Linux-x86_64.sh
source ~/.bashrc

Clone the repo and build the environment

git clone https://github.com/opendilab/LMDrive.git
cd LMDrive
conda create -n lmdrive python=3.8
conda activate lmdrive
cd vision_encoder
pip3 install -r requirements.txt
python setup.py develop # if you have installed timm before, please uninstall it
cd ../LAVIS
pip3 install -r requirements.txt
python setup.py develop # if you have installed LAVIS before, please uninstall it

pip install flash-attn --no-build-isolation # optional

Download and setup CARLA 0.9.10.1

chmod +x setup_carla.sh
./setup_carla.sh
pip install carla

If you encounter some problems related to Carla, please refer to Carla Issues and InterFuser Issues first.

LMDrive Weights

If you are interested in including any other details in Model Zoo, please open an issue :)

Version	Size	Checkpoint	VisionEncoder	LLM-base	DS (LangAuto)	DS (LangAuto-short)
LMDrive-1.0 (LLaVA-v1.5-7B)	7B	LMDrive-llava-v1.5-7b-v1.0	R50	LLaVA-v1.5-7B	36.2	50.6
LMDrive-1.0 (Vicuna-v1.5-7B)	7B	LMDrive-vicuna-v1.5-7b-v1.0	R50	Vicuna-v1.5-7B	33.5	45.3
LMDrive-1.0 (LLaMA-7B)	7B	LMDrive-llama-7b-v1.0	R50	LLaMA-7B	31.3	42.8

DS denotes the driving score

Dataset

We aim to develop an intelligent driving agent that can generate driving actions based on three sources of input: 1) sensor data (multi-view camera and LiDAR), so that the agent can generate actions that are aware of and compliant with the current scene; 2) navigation instructions (e.g. lane changing, turning), so that the agent can drive to meet the requirement in natural language (instruction from humans or navigation software); and 3) human notice instruction, so that the agent can interact with humans and adapt to human's suggestions and preferences (e.g. pay attention to adversarial events, deal with long-tail events, etc).

We provide a dataset with about 64K data clips, where each clip includes one navigation instruction, several notice instructions, a sequence of multi-modal multi-view sensor data, and control signals. The duration of the clip spans from 2 to 20 seconds. The dataset used in our paper can be downloaded here. If you want to create your own dataset, please follow the steps we've outlined below.

Overview

The data is generated with leaderboard/team_code/auto_pilot.py in 8 CARLA towns using the routes and scenarios files provided at leaderboard/data on CARLA 0.9.10.1 . The dataset is collected at a high frequency (~10Hz).

Once you have downloaded our dataset or collected your own dataset, it's necessary to systematically organize the data as follows. DATASET_ROOT is the root directory where your dataset is stored.

├── $DATASET_ROOT
│   └── dataset_index.txt  # for vision encoder pretraining
│   └── navigation_instruction_list.txt  # for instruction finetuning
│   └── notice_instruction_list.json  # for instruction finetuning
│   └── routes_town06_long_w7_11_28_18_28_35  #  data folder
│   └── routes_town01_short_w2_11_16_08_27_10
│   └── routes_town02_short_w2_11_16_22_55_25
│   └── routes_town01_short_w2_11_16_11_44_08 
      ├── rgb_full
      ├── lidar
      └── ...

The navigation_instruction_list.txt and notice_instruction_list.txt can be generated with our scripts by the data parsing scripts. Each subfolder in the dataset you've collected should be structured as follows:

- routes_town(town_id)_{tiny,short,long}_w(weather_id)_timestamp: corresponding to different towns and routes files
    - routes_X: contains data for an individual route
        - rgb_full: a big multi-view camera image at 400x1200 resolution, which can be split into four images (left, center, right, rear)
        - lidar: 3d point cloud in .npy format. It only includes the LiDAR points captured in 1/20 second, covering 180 degrees of horizontal view. So if you want to utilize 360 degrees of view, you need to merge it with the data from lidar_odd.
        - lidar_odd: 3d point cloud in .npy format.
        - birdview: topdown segmentation images, LAV and LBC used this type of data for training
        - topdown: similar to birdview but it's captured by the down-facing camera
        - 3d_bbs: 3d bounding boxes for different agents
        - affordances: different types of affordances
        - actors_data: contains the positions, velocities and other metadata of surrounding vehicles and the traffic lights
        - measurements: contains ego agent's position, velocity, future waypoints, and other metadata
        - measurements_full: merges measurement and actors_data
        - measurements_all.json: merges the files in measurement_full into a single file

The $DATASET_ROOT directory must contain a file named dataset_index.txt, which can be generated by our data pre-processing script. It should list the training and evaluation data in the following format:

<relative_route_path_dir> <num_data_frames_in_this_dir>
routes_town06_long_w7_11_28_18_28_35/ 1062
routes_town01_short_w2_11_16_08_27_10/ 1785
routes_town01_short_w2_11_16_09_55_05/ 918
routes_town02_short_w2_11_16_22_55_25/ 134
routes_town01_short_w2_11_16_11_44_08/ 569

Here, <relative_route_path_dir> should be a relative path to the $DATASET_ROOT. The training code will concatenate the $DATASET_ROOT and <relative_route_path_dir> to create the full path for loading the data. In this format, 1062 represents the number of frames in the routes_town06_long_w7_11_28_18_28_35/rgb_full directory or routes_town06_long_w7_11_28_18_28_35/lidar, etc.

Data Generation

Data Generation with multiple CARLA Servers

In addition to the dataset, we have also provided all the scripts used for generating data and these can be modified as required for different CARLA versions. The dataset is collected by a rule-based expert agent in different weathers and towns.

Running CARLA Servers

# Start 4 carla servers: ip [localhost], port [2000, 2002, 2004, 2006]. You can adjust the number of CARLA servers according to your situation and more servers can collect more data. If you use N servers to collect data, it means you have collected data N times on each route, except that the weather and traffic scenarios are random each time.

cd carla
CUDA_VISIBLE_DEVICES=0 ./CarlaUE4.sh --world-port=2000 -opengl &
CUDA_VISIBLE_DEVICES=1 ./CarlaUE4.sh --world-port=2002 -opengl &
CUDA_VISIBLE_DEVICES=2 ./CarlaUE4.sh --world-port=2004 -opengl &
CUDA_VISIBLE_DEVICES=3 ./CarlaUE4.sh --world-port=2006 -opengl &

Instructions for setting up docker are available here. Pull the docker image of CARLA 0.9.10.1 docker pull carlasim/carla:0.9.10.1.

Docker 18:

docker run -it --rm -p 2000-2002:2000-2002 --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 carlasim/carla:0.9.10.1 ./CarlaUE4.sh --world-port=2000 -opengl

Docker 19:

docker run -it --rm --net=host --gpus '"device=0"' carlasim/carla:0.9.10.1 ./CarlaUE4.sh --world-port=2000 -opengl

If the docker container doesn't start properly then add another environment variable -e SDL_AUDIODRIVER=dsp.

Run the Autopilot

Generate scripts for collecting data in batches.

cd dataset
python init_dir.py
cd ..
cd data_collection

# You can modify FPS, waypoints distribution strength in auto_agent.yaml ...

# If you do not use 4 servers, the following Python scripts are needed to modify
python generate_bashs.py
python generate_batch_collect.py 
cd ..

Run batch-run scripts of the town and route type that you need to collect.

bash data_collection/batch_run/run_route_routes_town01_long.sh
bash data_collection/batch_run/run_route_routes_town01_short.sh
...
bash data_collection/batch_run/run_route_routes_town07_tiny.sh
...
bash data_collection/batch_run/run_route_routes_town10_tiny.sh

Note: Our scripts will use a random weather condition for data collection

Data Generation with a single CARLA Server

With a single CARLA server, roll out the autopilot to start data generation.

carla/CarlaUE4.sh --world-port=2000 -opengl
./leaderboard/scripts/run_evaluation.sh

The expert agent used for data generation is defined in leaderboard/team_code/auto_pilot.py. Different variables which need to be set are specified in leaderboard/scripts/run_evaluation.sh.

Data Pre-procession

We provide some Python scripts for pre-processing the collected data in tools/data_preprocessing, some of them are optional. Please execute them in the order:

python get_list_file.py $DATASET_ROOT: obtain the dataset_list.txt.
python batch_merge_data.py $DATASET_ROOT: merge several scattered data files into one file to reduce IO time when training. [Optional]
python batch_rm_rgb_data.py $DATASET_ROOT: delete redundant files after we have merged them into new files. [Optional]
python batch_stat_blocked_data.py $DATASET_ROOT: find the frames that the ego-vehicle is blocked for a long time. By removing them, we can enhance data distribution and decrease the overall data size.
python batch_rm_blocked_data.py $DATASET_ROOT: delete the blocked frames.
python batch_recollect_data.py $DATASET_ROOT: since we have removed some frames, we need to reorganize them to ensure that the frame ids are continuous.
python batch_merge_measurements.py $DATASET_ROOT: merge the measurement files from all frames in one route folder to reduce IO time

Data Parsing

After collecting and pre-processing the data, we need to parse the navigation instructions and notice instructions data with some Python scripts in tools/data_parsing.

The script for parsing navigation instructions:

python3 parse_instruction.py $DATSET_ROOT

The parsed navigation clips will be saved in $DATSET_ROOT/navigation_instruction_list.txt, under the root directory of the dataset.

The script for parsing notice instructions:

python3 parse_notice.py $DATSET_ROOT

The parsed notice clips will be saved in $DATSET_ROOT/notice_instruction_list.txt.

The script for parsing misleading instructions:

python3 parse_misleading.py $DATSET_ROOT

The parsed misleading clips will be saved in $DATSET_ROOT/misleading_data.txt.

Training

LMDrive's training consists of two stages: 1) the vision encoder pre-training stage, to generate visual tokens from sensor inputs; and 2) the instruction-finetuning stage, to align the instruction/vision and control signal.

LMDrive is trained on 8 A100 GPUs with 80GB memory (the first stage can be trained on GPUS with 32G memory). To train on fewer GPUs, you can reduce the batch-size and the learning-rate while maintaining their proportion. Please download the multi-modal dataset with instructions collected in the CARLA simulator we use in the paper here or openxlab (uploading)], if you do not collect the dataset by yourself. You can only download part of them to verify our framework or your improvement.

Vision encoder pre-training

Pretrain takes around 2~3 days for the visual encoder on 8x A100 (80G). Once the training is completed, you can locate the checkpoint of the vision encoder in the output/ directory.

cd vision_encoder
bash scripts/train.sh

Some options to note:

GPU_NUM: the number of GPUs you want to use. By default, it is set to 8.
DATASET_ROOT: the root directory for storing the dataset.
--model: the structure of visual model. You can choose memfuser_baseline_e1d3_r26 which replaces ResNet50 with ResNet26. It's also possible to create new model variants in visual_encoder/timm/models/memfuser.py
--train-towns/train-weathers: the data filter for the training dataset. Similarly, there are corresponding options, val-towns/val-weathers to filter the validation dataset accordingly.

Instruction finetuning

Instruction finetuning takes around 2~3 days for the visual encoder on 8x A100 (80G). Once the training is completed, you can locate the checkpoint of the adapter and qformer in the lavis/output/ directory.

cd LAVIS
bash run.sh 8 lavis/projects/lmdrive/notice_llava15_visual_encoder_r50_seq40.yaml # 8 is the GPU number

Some options in the config.yaml to note:

preception_model: the model architecture of the vision encoder.
preception_model_ckpt: the checkpoint path of the vision encoder.
llm_model: the checkpoint path of the llm (Vicuna/LLaVA).
use_notice_prompt: whether to use notice instruction data when training.
split_section_num_for_visual_encoder: the number of sections the frames are divided into during the forward encoding of visual features. Higher values can save more memory, and it needs to be a factor of token_max_length.
datasets:
- storage: the root directory for storing the dataset.
- towns/weathers: the data filter for training/evaluating.
- token_max_length: the maximum number of frames, if the number of frames exceeds this value, they will be truncated.
- sample_interval: the interval at which frames are sampled.

Evaluation

Start a CARLA server (described above) and run the required agent. The adequate routes and scenarios files are provided in leaderboard/data and the required variables need to be set in leaderboard/scripts/run_evaluation.sh.

Some options need to be updated in the leaderboard/team_code/lmdrive_config.py:

preception_model: the model architecture of the vision encoder.
preception_model_ckpt: the checkpoint path of the vision encoder (obtained in the vision encoder pretraining stage).
llm_model: the checkpoint path of the llm (LLaMA/Vicuna/LLaVA).
lmdrive_ckpt: the checkpoint path of the lmdrive (obtained in the instruction finetuing stage).

Update leaderboard/scripts/run_evaluation.sh to include the following code for evaluating the model on Town05 Long Benchmark.

export CARLA_ROOT=/path/to/carla/root
export TEAM_AGENT=leaderboard/team_code/lmdrive_agent.py
export TEAM_CONFIG=leaderboard/team_code/lmdrive_config.py
export CHECKPOINT_ENDPOINT=results/lmdrive_result.json
export SCENARIOS=leaderboard/data/official/all_towns_traffic_scenarios_public.json
export ROUTES=leaderboard/data/LangAuto/long.xml

CUDA_VISIBLE_DEVICES=0 ./leaderboard/scripts/run_evaluation.sh

Here, the long.json and long.xml files are replaced with short.json and short.xml for the evaluation of the agent in the LangAuto-Short benchmark.

For LangAuto-Tiny benchmark evaluation, replace the long.json and long.xml files with tiny.json and tiny.xml:

export SCENARIOS=leaderboard/data/LangAuto/tiny.json
export ROUTES=leaderboard/data/LangAuto/tiny.xml

LangAuto-Notice

Set the agent_use_notice as True in the lmdriver_config.py.

Citation

If you find our repo, dataset or paper useful, please cite us as

@misc{shao2023lmdrive,
      title={LMDrive: Closed-Loop End-to-End Driving with Large Language Models}, 
      author={Hao Shao and Yuxuan Hu and Letian Wang and Steven L. Waslander and Yu Liu and Hongsheng Li},
      year={2023},
      eprint={2312.07488},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

This implementation is based on code from several repositories.

License

All code within this repository is under Apache License 2.0.

lmdrive's People

Stargazers

Watchers

Forkers

minisoco masemxiao herpacker ywfwyht xijunke sunnyboycq eltociear x1a-jk whuhxb navezjt robin-ai-ml hongbo123467 zaku-zaku monsterdove lycokie junyang0412 techthiyanes evdcush munirfarzeen azamshoaib sgjq min99830 yuzhouxianzhi ishanyu sjl-git 54457616 sprinter1999 gameinskysky wyddmw leopoldforkl haorand paperwave oyontalas timbrist pengyunzhao martinmeng2023 myzhangwww andy-cheng franz248 sprbull ai-yulu alvis-he weweibo edgscout humor-hu chrissunny94 wzn0828 jasonypretty

lmdrive's Issues

Questions about evaluation performance

Thank you for the active sharing of your valuable research.

I am deeply impressed by your research and tested a few things, but I have some questions.

I am curious about which configuration you used to obtain the DS values you reported.
https://github.com/opendilab/LMDrive?tab=readme-ov-file#lmdrive-weights

Here are my configurations:
leaderboard/teamcode/lmdrive_config.py

import os
 
 
class GlobalConfig:
    """base architecture configurations"""
 
    # Controller
    turn_KP = 1.25
    turn_KI = 0.75
    turn_KD = 0.3
    turn_n = 40  # buffer size
 
    speed_KP = 5.0
    speed_KI = 0.5
    speed_KD = 1.0
    speed_n = 40  # buffer size
 
    max_throttle = 0.75  # upper limit on throttle signal value in dataset
    brake_speed = 0.1  # desired speed below which brake is triggered
    brake_ratio = 1.1  # ratio of speed to desired speed at which brake is triggered
    clip_delta = 0.35  # maximum change in speed input to logitudinal controller
 
    llm_model = 'weights/llava-v1.5-7b'
    preception_model = 'memfuser_baseline_e1d3_return_feature'
    preception_model_ckpt = 'weights/LMDrive-vision-encoder-r50-v1.0/vision-encoder-r50.pth.tar'
    lmdrive_ckpt = 'weights/LMDrive-llava-v1.5-7b-v1.0/llava-v1.5-checkpoint.pth'
 
    agent_use_notice = False # True
    sample_rate = 2
 
 
    def __init__(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)

leaderboard/scripts/run_evaludation.sh

export ROUTES=langauto/benchmark_long.xml
export TEAM_AGENT=leaderboard/team_code/lmdriver_agent.py # agent
export TEAM_CONFIG=leaderboard/team_code/lmdriver_config.py # model checkpoint, not required for expert
export CHECKPOINT_ENDPOINT=results/sample_result.json # results file
#export SCENARIOS=leaderboard/data/scenarios/no_scenarios.json #town05_all_scenarios.json
export SCENARIOS=leaderboard/data/official/all_towns_traffic_scenarios_public.json

Are these configurations accurate? If these configurations are incorrect, then the following questions may not be necessary.

The results of the evaluation using above settings are as follows. However, a higher performance was achieved when agent_use_notice = False. What could be the reason for this? Also, the DS (LangAuto) reported by you was obtained from the "score_route" value. Does DS (LangAuto) not consider infraction penalties?

agent_use_notice = True

"scores": {
                "score_composed": 22.576320522257177,   
                "score_penalty": 0.7903388993546836,        
                "score_route": 27.799761292913487
},

agent_use_notice = False

"scores": {
                "score_composed": 29.6279197611852,
                "score_penalty": 0.8052244697640636,
                "score_route": 36.35275911099924
},

CARLA 0.9.10.1 - broken link?!

During setup, we are instructed to "Download and setup CARLA 0.9.10.1", however, that doesn't work.
Checking carla releases, I found this:

This XML file does not appear to have any style information associated with it. The document tree is shown below.

NoSuchBucket
The specified bucket does not exist
carla-releases
CMB3TJFPWA2B54Q4

UcYvb/qC9/dmT7512dEp/qgq3piqF+TZqYrjsRSyiEgv7UDlmSSvssjWL3CC7u89eg+QAGLM7xw=

I'll proceed installing carla 0.9.15, hoping that there is no big reason to use the older version instead.
Is there? It would be nice if you could update the instruction, if not.

Thanks! Great work, btw.

BEV Map Visualization

Hi,
In #21 you give some pointers to the BEV map coordinate, I'm wondering how we can visualize the BEV map processing results made by the vision encoder (like Fig 5 in the paper). Can you give me some brief instructions on how to do that (or pointers)?

Thanks!

RuntimeError: Error(s) in loading state_dict for ResNet

Hi! Much appreciated for the excellent work!
When doing instruction finetuning, I encountered an error:

WARNING:root:Pytorch pre-release version 1.14.0a0+410ce96 - assuming intent to test it
/usr/local/lib/python3.8/dist-packages/diffusers/models/cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
deprecate(
| distributed init (rank 0, world 1): env://
[1704792300.373239] [7771d2eff014:2391 :f] vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device
Traceback (most recent call last):
File "train.py", line 103, in
main()
File "train.py", line 94, in main
model = task.build_model(cfg)
File "/workspace/code/LMDrive/LAVIS/lavis/tasks/drive.py", line 35, in build_model
return model_cls.from_config(model_config)
File "/workspace/code/LMDrive/LAVIS/lavis/models/drive_models/drive.py", line 575, in from_config
model = cls(
File "/workspace/code/LMDrive/LAVIS/lavis/models/drive_models/drive.py", line 87, in init
self.visual_encoder.load_state_dict(pretrain_weights, strict=True)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1918, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict:...

In your original config file "notice_llava15_visual_encoder_r50_seq40.yaml",
preception_model: memfuser_baseline_e1d3_return_feature
It would cause "RuntimeError: Unknown model (memfuser_baseline_e1d3_return_feature)"
So I changed 'memfuser_baseline_e1d3_return_feature' into 'resnet50', and the above 'RuntimeError: Error(s) in loading state_dict for ResNet:' occurred. Do you know how to fix this?
I noticed that there is another error:"vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device". Does it have something to do with my failure?
Many thanks and looking forward to your reply.

download issues

Thank you for your great work! I'm having some download issues
Due to my network problem, I can't download "deepcs233/LMDrive-vicuna-v1.5-7b-v1.0" on the console, the checkpoint downloaded manually is incomplete, the size of the original checkpoint is 946MB, and the size of the manual download is 903MB, is there any other way to download it。
Thanks again！

数据集下载

请问有网盘之类的链接吗？我在huggingface上git clone下载出现以下错误：
fatal: unable to access 'https://huggingface.c^Cdatasets/OpenDILabCommunity/LMDrive/': URL rejected: Bad hostname
Error(s) during clone:
git clone failed: exit status 128
代理已经开了呢。

Question about 2D bounding box

I want to transform the 3d_bbs to 2D bounding boxes of the each frame, but I can't get extrinsic matrix of the sensor. Can you offer some suggestions for me to get 2D bbox? Thanks!

Use simulator and PythonAPI with different versions, Segmentation fault occurs

          In https://github.com/opendilab/LMDrive/issues/14#issuecomment-1898154122, the author change the PythonAPI version into 0.9.12 and it works fine.

However, when I use simulator and PythonAPI with different versions, Segmentation fault occurs.
Does anyone face the same situation?

The port is openning for sure.

Python 3.8.18 | packaged by conda-forge | (default, Dec 23 2023, 17:21:28)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.3 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import carla
In [2]: client = carla.Client('localhost', 6008)
In [3]: traffic_manager = client.get_trafficmanager(6009)
WARNING: Version mismatch detected: You are trying to connect to a simulator that might be incompatible with this API
WARNING: Client API version = 0.9.12
WARNING: Simulator API version = 784d9b9f
Segmentation fault

Originally posted by @dingli-dean in #14 (comment)

About Training Time

Hi authors! Thanks for your excellent work. I encounter low training efficiency issues. The second Instruction finetuning stage takes about 6 days on my 8*A100 (40G) GPUs, utilizing only Town01 data downloaded from openxlab. I noticed you mentioned that Instruction finetuning takes around 3 days for the visual encoder on 8x A100 (80G). If you utilize all (Town01-Town07,Town10) data during the finetuning stage? and what could be the possible reasons on my machine?

Where is generate_yamls.py?

I want to reproduce your process of generating data, but according to your tutorial

cd dataset
python init_dir.py
cd ..
cd data_collection
python generate_yamls.py # You can modify FPS, waypoints distribution strength ...

# If you do not use 4 servers, the following Python scripts are needed to modify
python generate_bashs.py
python generate_batch_collect.py 
cd ..

I did not find the generate_yamls.py file and the script mentioned below, can you give me more information about them and release them?

Segmentation fault during evaluation

Hi, I could train the model, but met segmentation fault during evaluation. Did anyone meet similar problem?

CUDA_VISIBLE_DEVICES=0 leaderboard/scripts/run_evaluation.sh
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
leaderboard/leaderboard/leaderboard_evaluator.py
leaderboard/scripts/run_evaluation.sh: line 44: 646601 Segmentation fault (core dumped) python3 -u ${LEADERBOARD_ROOT}/leaderboard/leaderboard_evaluator.py --scenarios=${SCENARIOS} --routes=${ROUTES} --repetitions=${REPETITIONS} --track=${CHALLENGE_TRACK_CODENAME} --checkpoint=${CHECKPOINT_ENDPOINT} --agent=${TEAM_AGENT} --agent-config=${TEAM_CONFIG} --debug=${DEBUG_CHALLENGE} --record=${RECORD_PATH} --resume=${RESUME} --port=${PORT} --trafficManagerPort=${TM_PORT}

请问在Instruction finetuning阶段的gt是如何设置的，就是当前帧的控制信号作为gt吗？（acc，ster，gear吗？）

Training Issues

Thank you for your excellent work! I have some trouble with training:
I tried to install slurm for cluster job scheduling, but unfortunately many attempts failed. So, what we want to know is if there is any impact on training if we don't use the srun command, but execute the training script directly (for example perform ./distributed_pretrain.sh 8 '/path/to/your/dataset' ... in pre-training stage)?

Evaluation speed is too slow

Hi, team, thanks for your attention.
I try to evaluate the released model LMDrive-llava-v1.5-7b-v1.0, and observe that the evaluation process is too slow.
Here is the statistics of evaluation in one route scenario, is the evaluation speed as fast as expected ?
Evaluation running on one 3090 GPU.

========= Results of RouteScenario_51 (repetition 0) ------ SUCCESS =========

╒═════════════════════════════════╤═════════════════════╕
│ Start Time │ 2024-03-26 12:46:58 │
├─────────────────────────────────┼─────────────────────┤
│ End Time │ 2024-03-26 13:58:15 │
├─────────────────────────────────┼─────────────────────┤
│ Duration (System Time) │ 4276.92s │
├─────────────────────────────────┼─────────────────────┤
│ Duration (Game Time) │ 324.25s │
├─────────────────────────────────┼─────────────────────┤
│ Ratio (System Time / Game Time) │ 0.076 │
╘═════════════════════════════════╧═════════════════════╛

╒═══════════════════════╤═════════╤═════════╕
│ Criterion │ Result │ Value │
├───────────────────────┼─────────┼─────────┤
│ RouteCompletionTest │ SUCCESS │ 100 % │
├───────────────────────┼─────────┼─────────┤
│ OutsideRouteLanesTest │ SUCCESS │ 0 % │
├───────────────────────┼─────────┼─────────┤
│ CollisionTest │ SUCCESS │ 0 times │
├───────────────────────┼─────────┼─────────┤
│ RunningRedLightTest │ SUCCESS │ 0 times │
├───────────────────────┼─────────┼─────────┤
│ RunningStopTest │ SUCCESS │ 0 times │
├───────────────────────┼─────────┼─────────┤
│ InRouteTest │ SUCCESS │ │
├───────────────────────┼─────────┼─────────┤
│ AgentBlockedTest │ SUCCESS │ │
├───────────────────────┼─────────┼─────────┤
│ Timeout │ SUCCESS │ │
╘═══════════════════════╧═════════╧═════════╛

Dataset download problem

Hi, due to policy restrictions in China, it is no longer possible to directly download your dataset through Hugging Face ( https://huggingface.co/datasets/OpenDILabCommunity/LMDrive/tree/main). May I ask if there is a Baidu Cloud link (or some other source links) available?

GPU Required for Vision encoder pre-training and Instruction finetuning

Hello, doctor Shao, wonderful work!

may I ask if a 24GB 3090 GPU can handle the Vision encoder pre-training and Instruction finetuning?

thank you!

no audio card

hi if my server didnt have audio card, how can i ban it from carla? Thank you!

Training process related to misleading information

Hello, thank you for your wonderful work!

I have a question. How is the training setting related to misleading instructions mentioned in the paper implemented? I don’t seem to see it in the code.

Looking forward to your reply, thank you~

BEV map coordinate

生成bev map图时，x,y方向的坐标系是什么？
或者说 measurements里的GPS_x,GPS_y的坐标系是什么？
谢谢

About the LLM model

Hi! Much appreciated for the excellent work!
I noticed that 'the checkpoint path of the llm (LLaMA/Vicuna/LLaVA)' is required for evaluation. I wonder do I need to install LLAVA following their github projects, or just downloading LLAVA checkpoints is adequate to run the evaluation of LMDrive?

Evaluation on Town-05 Long benchmark

Hi, team. Thanks for releasing the exceptional work.
I try to evaluate the released model (llava-v1.5) on town05 long benchmark (with leaderboard/data/evaluation_routes/routes_town05_long.xml), and sadly observe that the results are significantly lower than current SOTA methods.

Did you evaluate LMDrive model on town-05 long benchmark? If so, can you show the performance comparison between LMDrive and other methods?

Thanks again for your attention, and look forward to your reply.

carla can't run on GPU

Hi, I'm using two terminals to run carla and evaluation code separately.

I installed carla in a conda environment on a remote server, installed openGL in a docker container, and used ssh to remotely connect to Mabaxterm and configure X11-forwarding for visualization.

Execute the following command to run carla: CUDA_VISIBLE_DEVICES=0 . /carla/CarlaUE4.sh -prefernvidia -opengl -carla-server -benchmark -fps=5 -nosound --world-port=15302

But carla runs without GPU, resulting in rendering no environment.

Can you please tell me how to solve this problem？

Evaluation error

My lmdriver_config.py was set as:

llm_model = 'C:/doctor/LMDrive-main/results/llava-v1.5-7b'
preception_model = 'memfuser_baseline_e1d3_return_feature'
preception_model_ckpt = 'C:/doctor/LMDrive-main/results/vision-encoder-r50.pth.tar'
lmdrive_ckpt = 'C:/doctor/LMDrive-main/results/llava-v1.5-checkpoint.pth'

But when I run the code, it gets stuck and reports an error：

Have you ever encountered this kind of situation？

real vehicles

Hi,Thanks for your great work! I would like to inquire, based on your experience and research, do you believe it is feasible to deploy this work onto real vehicles? Particularly, considering our current computational resources are 2 A800 GPUs (80G). In this scenario, how long do you think it might take us to achieve this goal? Are there any key technical challenges or issues that need to be addressed?

How to convert Instruction code to natural Language

Hi! How to convent Instruction codes such as "Follow-03-s1" to natural Language?

rgb_full camera image resolution mismatch

Instead of $400 \times 1200$ resolution, the actual rgb_full camera image resolution is $800 \times 2400$.

How to generate corresponding instructions in a new Town? Such as Town13?

Hi!

I found that both './tools/data_parsing' and 'InstructionPlanner' have some predefined values, such as some coordinates corresponding to each Town. If these values are in a new Town, such as leaderboard2.0's Town13, how can these be generated?

Looking forward to your reply, thank you~

For example:

leaderboard/team_code/planner.py
'self.highway_mapping = {"Town04":[[-487.84,361.47,2.84,44.26],[-19.73,18.43,-279.10,278.82],[94.88,333.41,-360.95,-398.73],[-376.92,-93.35,400.16,440.26],[-517.71,-478.04,37.87,319.51]],
"Town05":[[-257.43,-217.99,-179.75,175.86],[184.14,218.66,-175.28,174.62],[-204.68,162.79,-217.05,-181.86],[-210.39,179.10,182.47,218.67]],
"Town06":[[-302.75,625.72,-8.10,-26.18],[-278.19,651.95,35.46,54.74],[-286.53,649.00,135.70,155.15],[-323.99,647.78,236.17,254.11],[656.33,673.07,12.02,228.79],[-372.63,-359.37,13.01,230.13]] }'
tools/data_parsing/turn_rules.py
class TurnFalse(Turn):
def init(self, direction):
super().init()
self.sample_range = 100
self.direction = direction
self.direction_index_mapping = {"left": 0, "right": 1}
self.direction_command_mapping = {"left": 2, "right": 1}
self.total_range = 128
self.Tjunction_loc_mapping = {"1": [[[74.31,2.50,2],[140.33,2.33,2],[320.09,2.32,2],[92.69,72.83,2],[173.38,55.50,2],[334.70,41.42,2],[92.29,147.24,2],[334.53,114.74,2],[92.81,215.65,2],[334.66,179.34,2],[107.98,326.93,2],[354.03,326.60,2]],[[107.74,-1.54,2],[173.88,-1.73,2],[352.97,-1.82,2],[88.03,41.60,2],[139.77,59.79,2],[339.00,73.74,2],[88.19,115.30,2],[339.00,148.54,2],[88.08,181.66,2],[338.96,214.46,2],[73.55,330.99,2],[319.83,331.13,2]]],
"2": [[[-3.23,205.09,2],[29.32,192.01,2],[117.29,192.16,2],[189.37,175.07,2],[45.80,253.94,2],[148.54,237.29,2],[189.51,225.80,2],[58.12,302.90,2]],[[-7.67,176.45,2],[58.12,187.84,2],[150.38,187.84,2],[193.96,204.03,2],[41.60,223.79,2],[119.50,241.48,2],[193.75,253.49,2],[28.17,307.19,2]]],
"3": [[[-18.78,-195.73,3],[149.05,-151.50,3],[84.34,-58.19,3],[147.97,-88.64,3],[103.13,-4.93,3],[172.18,-3.80,3],[233.99,-22.35,3],[232.47,37.33,3],[146.53,62.99,2],[22.64,195.33,3]],[[26.14,-205.40,3],[153.95,-116.42,2],[152.70,-58.14,2],[79.87,-88.97,2],[58.55,6.42,3],[129.03,7.83,3],[243.73,27.13,3],[190.25,58.98,2],[241.73,86.48,3],[-24.61,204.59,3]]],
"4": [[[179.01,-369.13,5],[242.51,-307.31,2],[81.06,-173.40,2],[148.32,-172.98,2],[388.35,-193.76,5],[272.62,-122.34,2],[386.55,-90,5]],[[272.72,-310.59,2],[44.36,-169.96,2],[116.76,-169.19,2],[239.88,-118.47,2]]],
"5": [[[-149.31,-136.65,3],[-101.78,145.94,3],[7.10,-189.40,5],[55.46,191.61,5],[33.34,162.25,3],[151.32,-18.97,2],[36.20,-129.62,3]],[[-103.21,-143.83,3],[-149.73,153.17,3],[30.16,-165.07,3],[26.36,125.75,3],[155.49,17.09,2]]],
"6": [[],[[665.92,66.59,5],[664.88,168.47,5]]],
"7": [[[-197.95,-147.58,2],[-85.62,-158.09,2],[-4.47,-171.80,2],[-5.13,-121.57,2],[-98.38,-50.71,2],[-5.83,-76.91,2],[-199.07,-22.44,2],[-99.01,15.57,2],[-199.03,64.57,2],[-138.96,48.95,2],[10.28,58.64,2],[-95.56,116.56,2]],[[-202.47,-174.34,2],[-202.13,-49.13,2],[-203.63,36.88,2],[-122.42,117.61,2],[-0.51,-145.93,2],[-1.19,-50.03,2]]],
"10": [[[64.47,64.40,3]],[[19.83,71.62,3]]]}
self.intersection_loc_mapping = {"1": [[90.30,0.51,25],[156.93,1.09,25],[336.86,1.39,25],[337.33,326.93,40],[90.95,327.01,40],[92.37,196.73,30],[91.87,131.36,30],[92.17,57.97,25],[156.05,55.61,25],[335.12,57.68,25],[335.78,130.58,30],[336.32,196.97,30]],
"2": [[43.31,304.10,30],[-5.34,190.45,30],[192.52,189.99,30],[190.68,239.30,30],[134.06,238.50,30],[43.77,238.49,30],[43.51,190.68,30],[133.09,189.44,30]],
"3": [[3.93,-199.79,35],[236.43,0.77,30],[237.27,61.02,30],[-1.39,196.76,35],[151.53,-132.98,30],[149.59,-72.75,30],[80.90,-74.44,30],[148.68,-5.98,30],[78.58,-5.19,30],[169.12,64.11,30],[-226.23,-2.30],[-223.15,103.26],[83.79,-257.12],[157.84,-256.18],[-146.60,-1.44],[-84.86,133.58],[-2.82,132.36],[-81.72,-137.82],[2.44,-135.59],[83.89,-135.75],[85.39,-199.39],[153.65,-198.61]],
"4": [[257.15,-308.29,25],[256.30,-122.12,25],[128.78,-172.50,30],[61.36,-174.60,25],[15.01,-172.33,25],[205.67,-364.69,30],[393.50,-171.28,25],[381.09,-67.54,30],[203.01,-309.33],[202.12,-247.58],[200.61,-171.29],[256.94,-248.01],[256.49,-170.93],[313.26,-248.37]],
"5": [[34.01,-182.82,20],[40.02,-147.67,20],[153.47,-0.52,25],[40.85,142.48,25],[30.24,198.96,30],[-126.12,-137.57,20],[-124.06,148.97,25],[-268.82,-1.19,30],[-189.88,-90.40,30],[-189.49,0.79,30],[-190.41,89.65,30],[-127.13,-89.45,30],[-126.58,1.19,30],[-125.56,89.59,30],[-49.85,-89.76,30],[-49.13,0.86,30],[-49.28,89.65,30],[31.55,-89.33,30],[29.53,0.28,30],[29.20,89.69,30],[101.55,-0.07,30]],
"6": [[662.70,41.96,40],[662.41,144.54,40],[-1.63,-17.53,40],[-1.84,49.77,40],[-0.50,141.78,40],[1.29,244.84,40]],
"7": [[-197.22,-161.53,40],[-1.85,-238.09,40],[67.08,-1.04,35],[67.25,60.09,35],[-109.01,113.97,35],[-198.61,49.24,25],[-198.65,-36.34,25],[-151.27,48.35,25],[-100.17,-34.76,15],[-100.46,-63.77,15],[-101.47,-96.25,10],[-85.31,-111.70,10],[-73.35,-159.14,30],[-3.43,-159.27,30],[-4.05,-107.83,15],[-4.45,-64.86,20],[-4.79,57.83,35],[-101.62,53.08],[-3.78,-1.48],[-150.54,-35.13]],
"10": [[-44.76,-55.94,30],[96.00,-21.14,30],[96.84,68.01,30],[-46.40,127.21,30],[-99.79,19.70,30],[-38.44,65.96,30],[41.59,66.94,20],[41.08,30.14,20],[-47.38,19.22]]}

Failed to run carla.

Hi, I meet a problem while running carla. Your help would be greatly appreciated.
My command is:
carla/CarlaUE4.sh --world-port=2000 -opengl
Error info is:
Refusing to run with the root privileges.

Performance metrics in the paper

Hi, thanks for your nice work. I have a question about reproducing the driving score shown in the paper. I run the evaluation with the following configurations:

    preception_model = 'memfuser_baseline_e1d3_return_feature'
    preception_model_ckpt = '/LMDrive/ckpt/vision-encoder-r50.pth.tar'
    llm_model = '/LMDrive/llm_model/llava-v1.5-7b'
    lmdrive_ckpt = '/LMDrive/ckpt/llava-v1.5-checkpoint.pth'
    agent_use_notice = False
    sample_rate = 2

When I compare the result I reproduce (which is obtained from the result.json file), the "Avg. driving score" and "Avg. route completion" are lower than the metrics shown in the paper (the Avg. infraction penalty is the same). Were the values shown in the paper Table 2 also normalized by the driving distance which is the same as Table 4? or is there any possibility that I configured different than yours?

Thank you!

pygame 单独demo可以通,但是跟lmdrive配起来就会挂

代码和显示图如下

Minimum gpu memory to run evaluation?

Hi,

I'm trying to run the evaluation through run_evaluation.sh on a RTX 3070Ti GPU but the GPU ran out of memory during loading the models, can you share your evaluation setup? Do you have a brief sense about how much GPU memory is needed for running such experiments?

Thanks.

Evaluation Scripts Clarification

Hi @deepcs233
Thanks for your great work.

Our team is investigating this topic and hope to re-implement the codes to generate the metric index as your paper claimed.

However, we have difficulties during evaluation. (Maybe we are unfamiliar with Carla server running)

Could you supplement more details in evaluation parts in readme.md
It can't be better if your guidance could help us replicate the metric index claimed in your paper.

Thanks again.

LLM生成的控制信号

论文4.2中提到，LLM生成必要的控制信号，并预测给定指令是否完成。但是我在代码中的leaderboard/team_code/lmdriver_agent.py里面，看到好像只有使用预测的waypoints进行导航，而并没有用到LLM预测的控制信号。
请问可以帮忙解释一下论文中LLM和控制信号的关系和起到的作用吗？我一开始以为是LLM和其他模态输入之后直接生成控制信号控制车辆运动。

Where should the LMDrive Weights be put?

Sorry, I'm quite confused.
Where should the models be put?
By inspecting some of the scripts, it seems there should be a folder 'data'... but I couldn't track it all.

Thank you in advance.

Model principles and running problems in carla

@deepcs233
Hello, thank you for your work. I have two questions for you.
1、This is about the model itself. At that time, when adopting a multimodal large language model, why did llava framework not directly use a linear layer for visual mode and language mode alignment, but Qformer used by blip2 framework for alignment? What other considerations are there?
2、When the model was running in the evaluation part, pygame and multi-modal large model parameters were not loaded, and no error was reported, and the program died directly. The premise has been deployed in accordance with the operation of the model environment, downloaded the weight parameters of the model, began to report errors, after modification has not reported errors, but it is not running.
We look forward to your reply and thank you very much.

evaluation的时候卡住

您好，非常感谢你的优秀工作。我在evaluation的时候一直卡在这个界面，我一些都是按照readme里面的教程设置的。

dataset

请问下载数据集一直卡在这里是正常的吗，已经卡好几天了

我是在openxlab下载的。

RuntimeError: Python version >= 3.9 required

The specified python version as suggested is 3.8, but after executing the following

conda create -n lmdrive python=3.8
conda activate lmdrive
cd vision_encoder
python setup.py develop

I got this error:

  File "/tmp/easy_install-7_5rk8p_/numpy-1.26.3/setup.py", line 22, in <module>
    author_email="[email protected]",
RuntimeError: Python version >= 3.9 required.

Could you test this and suggest a solution? Thanks.

evaluation卡住

您好，非常感谢你的工作和开源。我在运行evaluation代码的时候程序直接结束在下面的状态，请问是为什么。

Training details

前视相机的内参是多少？

Questions about model.

请问有试过放开llm的参数进行训练吗，有试过的话请问跟目前 frozen llm的比精度怎么样？

Questions on generating own routes and data

Hi,

Thanks for open-sourcing your great work. I want to create some custom routes and I saw that in the data generation process, you requires a bunch of .xml configs in the leaderboard/data/training routes. I wonder how to generate those xml files to create our own routes (e.g. how to correctly set waypoints in these .xml files)?>

Missing the file "dataset_index_test.txt"

https://github.com/opendilab/LMDrive/blob/684e92241b9969eed9c3c85c9c20fd8842d426dd/vision_encoder/timm/data/carla_dataset.py#L179C62-L179C80

Thanks for your work!

I'm trying to follow your instructions to fine-tune the vision encoder. However, the code shown above needs a file named "dataset_index_test.txt." According to https://huggingface.co/datasets/deepcs233/LMDrive, there is only a file named "dataset_index.txt." I also renamed the file to match your code. But it faced lots of FileNotFoundError when running, indicating these two files do not match. Any idea about this?

Evaluation settings

Hi, thanks for your excellent work and code release. According to the README file, it is required to update the scenario .json file and route .xml file accordingly. However, there is no folder of leaderboard/data/LangAuto/. I'd like to know if there is something wrong with the folder architecture or if I can simply leave the scenario file unchanged. Looking forward to your reply.

关于rpc::rpc_error during call in function get_sensor_token

Traceback (most recent call last):
File "leaderboard/leaderboard/leaderboard_evaluator.py", line 483, in main
leaderboard_evaluator = LeaderboardEvaluator(arguments, statistics_manager)
File "leaderboard/leaderboard/leaderboard_evaluator.py", line 88, in init
self.traffic_manager = self.client.get_trafficmanager(int(args.trafficManagerPort))
RuntimeError: rpc::rpc_error during call in function get_sensor_token
Exception ignored in: <function LeaderboardEvaluator.del at 0x7fbd8e0e3f70>
Traceback (most recent call last):
File "leaderboard/leaderboard/leaderboard_evaluator.py", line 127, in del
self._cleanup()
File "leaderboard/leaderboard/leaderboard_evaluator.py", line 139, in _cleanup
if self.manager and self.manager.get_running_status()
AttributeError: 'LeaderboardEvaluator' object has no attribute 'manager'
Traceback (most recent call last):
File "leaderboard/leaderboard/leaderboard_evaluator.py", line 493, in
main()
File "leaderboard/leaderboard/leaderboard_evaluator.py", line 489, in main
del leaderboard_evaluator
UnboundLocalError: local variable 'leaderboard_evaluator' referenced before assignment

inference frequency

Hi!
Thanks for your excellent work. What is the frequency of model inference output on A100？
Thanks!

Can I use this model by huggingface transformers？

from transformers import AutoTokenizer, AutoModel？？

Confusion about the impact of the LLM Models

Thanks for your excellent work. I would like to ask if you have compared the results of replacing LLM with Traditional Transformer for end-to-end training? Because judging from the current closed-loop results of Town5, the models introducing LLM do not seem to surpass traditional models, such as interFuser, etc. Where are the performance gains brought by LLM mainly reflected?

Questions about evaluation.

Hi! I have 2 questions about evaluation. I'd be grateful for you help.

Where to set directory of dataset before evaluation?
When running to the following code, the pygame window crash and the program is stuck.

# leaderboard/team_code/lmdriver_agent.py: DisplayInterface.__init__
        self._display = pygame.display.set_mode(
            (self._width, self._height), pygame.HWSURFACE | pygame.DOUBLEBUF
        )

I tried to execute the following code separately, the pygame window continues to open.

import pygame
pygame.init()
pygame.font.init()
pygame.display.set_mode((1200, 900), pygame.HWSURFACE | pygame.DOUBLEBUF)

Thanks in advance!