Code Monkey home page Code Monkey logo

lhrs-bot's Introduction

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

Dilxat Muhtar*, Zhenshi Li* , Feng Gu, Xueliang Zhang, and Pengfeng Xiao
(*Equal Contribution)

News | Introduction | Preparation | Demo | Acknowledgement | Statement

News

  • [Feb 21 2024]: We have updated our evaluation code. Any advice are welcom!
  • [Feb 7 2024]: Model weights are now available on both Google Drive and Baidu Disk.
  • [Feb 6 2024]: Our paper now is available at arxiv.
  • [Feb 2 2024]: We are excited to announce the release of our code and model checkpoint! Our dataset and training recipe will be update soon!

Introduction

We are excited to introduce LHRS-Bot, a multimodal large language model (MLLM) that leverages globally available volunteer geographic information (VGI) and remote sensing images (RS). LHRS-Bot demonstrates a deep understanding of RS imagery and possesses the capability for sophisticated reasoning within the RS domain. In this repository, we will release our code, training framework, model weights, and dataset!

Preparation

Installation

  1. Clone this repository.

    git clone [email protected]:NJU-LHRS/LHRS-Bot.git
    cd LHRS-Bot
  2. Create a new virtual enviroment

    conda create -n lhrs python=3.10
    conda activate lhrs
  3. Install dependences and our package

    pip install -e .

Checkpoints

  • LLaMA2-7B-Chat

    • Automaticaly download:

      Our framework is designed to automatically download the checkpoint when you initiate training or run a demo. However, there are a few preparatory steps you need to complete:

      1. Request the LLaMA2-7B models from Meta website.

      2. After your request been processed, login to huggingface using your personal access tokens:

        huggingface-cli login
        (Then paste your access token and press Enter)
      3. Done!

    • Mannually download:

      • Download all the files from HuggingFace.

      • Change the following line of each file to your downloaded directory:

        • /Config/multi_modal_stage{1, 2, 3}.yaml

          ...
          text:
          	...
            path: ""  # TODO: Direct to your directory
          ...
        • /Config/multi_modal_eval.yaml

          ...
          text:
          	...
            path: ""  # TODO: Direct to your directory
          ...
  • LHRS-Bot Checkpoints:

    Staeg1 Stage2 Stage3
    Baidu Disk, Google Drive Baidu Disk, Google Drive Baidu Disk, Google Drive
    • โš ๏ธ Ensure that the TextLoRA folder is located in the same directory as FINAL.pt. The name TextLoRA should remain unchanged. Our framework will automatically detect the version perceiver checkpoint and, if possible, load and merge the LoRA module.

    • Development Checkpoint:

      We will continually update our model with advanced techniques. If you're interested, feel free to download it and have fun :)

      Development
      Baidu Disk, Google Drive

Demo

  • Online Web UI demo with gradio:

    python lhrs_webui.py \
         -c Config/multi_modal_eval.yaml \           # config file
         --checkpoint-path ${PathToCheckpoint}.pt \  # path to checkpoint end with .pt
         --server-port 8000 \                        # change if you need
         --server-name 127.0.0.1 \                   # change if you need
         --share                                     # if you want to share with other
  • Command line demo:

    python cli_qa.py \
         -c Config/multi_modal_eval.yaml \                 # config file
         --model-path ${PathToCheckpoint}.pt \             # path to checkpoint end with .pt
         --image-file ${TheImagePathYouWantToChat} \       # path to image file (Only Single Image File is supported)
         --accelerator "gpu" \                             # change if you need ["mps", "cpu", "gpu"]
         --temperature 0.4 \
         --max-new-tokens 512
  • Inference:

    • Classification

      python main_cls.py \
           -c Config/multi_modal_eval.yaml \                 # config file
           --model-path ${PathToCheckpoint}.pt \             # path to checkpoint end with .pt
           --data-path ${ImageFolder} \                      # path to classification image folder
           --accelerator "gpu" \                             # change if you need ["mps", "cpu", "gpu"]
           --workers 4 \
           --enabl-amp True \
           --output ${YourOutputDir}                         # Path to output (result, metric etc.)
           --batch-size 8 \
    • Visual Grounding

      python main_vg.py \
           -c Config/multi_modal_eval.yaml \                 # config file
           --model-path ${PathToCheckpoint}.pt \             # path to checkpoint end with .pt
           --data-path ${ImageFolder} \                      # path to image folder
           --accelerator "gpu" \                             # change if you need ["mps", "cpu", "gpu"]
           --workers 2 \
           --enabl-amp True \
           --output ${YourOutputDir}                         # Path to output (result, metric etc.)
           --batch-size 1 \                                  # It's better to use batchsize 1, since we find batch inference
           --data-target ${ParsedLabelJsonPath}              # is not stable.
    • Visual Question Answering

      python main_vqa.py \
           -c Config/multi_modal_eval.yaml \                 # config file
           --model-path ${PathToCheckpoint}.pt \             # path to checkpoint end with .pt
           --data-path ${Image} \                            # path to image folder
           --accelerator "gpu" \                             # change if you need ["mps", "cpu", "gpu"]
           --workers 2 \
           --enabl-amp True \
           --output ${YourOutputDir}                         # Path to output (result, metric etc.)
           --batch-size 1 \                                  # It's better to use batchsize 1, since we find batch inference
           --data-target ${ParsedLabelJsonPath}              # is not stable.
           --data-type "HR"                                  # choose from ["HR", "LR"]

Acknowledgement

Statement

  • If you find our work is useful, please give us ๐ŸŒŸ in GitHub and consider cite our paper:

    @misc{2402.02544,
    Author = {Dilxat Muhtar and Zhenshi Li and Feng Gu and Xueliang Zhang and Pengfeng Xiao},
    Title = {LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model},
    Year = {2024},
    Eprint = {arXiv:2402.02544},
    }
  • Licence: Apache

lhrs-bot's People

Contributors

pumpkin-co avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.