Code Monkey home page Code Monkey logo

hpt's Introduction

HPT - Open Multimodal Large Language Models

Hyper-Pretrained Transformers (HPT) is a novel multimodal LLM framework from HyperGAI, and has been trained for vision-language models that are capable of understanding both textual and visual inputs. HPT has achieved highly competitive results with state-of-the-art models on a variety of multimodal LLM benchmarks. This repository contains the open-source implementation of inference code to reproduce the evaluation results of HPT Air on different benchmarks. The model weights are released in HuggingFace Repository.

For more details and exciting examples of HPT, please read our technical blog post.

Table of Contents

Overview of Model Achitecture


Quick Start

Installation

pip install -r requirements.txt
pip install -e .

Prepare the Model

You can download the model weights from HF into your [Local Path] and set the global_model_path as your [Local Path] in the model config file:

git lfs install
git clone https://huggingface.co/HyperGAI/HPT [Local Path]

or directly set global_model_path as the HF repo-id ('HyperGAI/HPT').

You can also set other strategies in the config file that are different from our default settings.

Demo

After setting up the config file, launch the model demo for a quick trial:

python demo/demo.py --image_path [Image]  --text [Text]  --model [Config]

Example:

python demo/demo.py --image_path demo/einstein.jpg  --text 'Question: What is unusual about this image?\nAnswer:'  --model hpt-air-demo

You can design different prompts here to boost the question.

Benchmark Evaluations

Launch the model for benchmark evaluation:

torchrun --nproc-per-node=8 run.py --data [Dataset] --model [Config]

Example:

torchrun --nproc-per-node=8 run.py --data MMMU_DEV_VAL --model hpt-air-mmmu

[1] If not specifically mentioned, all listed results are from the test set. You may need to submit the result file into the server to obtain the final score.

Pretrained Models Used

Disclaimer and Responsible Use

Note that the HPT Air is a quick open release of our models to facilitate the open, responsible AI research and community development. It does not have any moderation mechanism and provides no guarantees on their results. We hope to engage with the community to make the model finely respect guardrails to allow practical adoptions in real-world applications requiring moderated outputs.

Contact Us

License

This project is released under the Apache 2.0 license. Parts of this project contain code and models from other sources, which are subject to their respective licenses and you need to apply their respective license if you want to use for commercial purposes.

Acknowledgements

The evaluation code for running this demo was extended based on the VLMEvalKit project. We also thank OpenAI for open-sourcing their visual encoder models and 01.AI for open-sourcing their large language models.

hpt's People

Contributors

xwwu2015 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.