Code Monkey home page Code Monkey logo

gui-odyssey's Introduction

GUI Odyssey

This repository is the official implementation of GUI Odyssey.

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu, Wenqi Shao✉️⭐️, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo✉️
✉️ Wenqi Shao ([email protected]) and Ping Luo ([email protected]) are correponding authors.
⭐️ Wenqi Shao is project leader.

💡 News

🔆 Introduction

GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos. overview

🛠️ Data collection pipeline

GUI Odyssey comprises six categories of navigation tasks. For each category, we construct instruction templates with items and apps selected from a predefined pool, resulting in a vast array of unique instructions for annotating GUI episodes. Human demonstrations on an Android emulator capture the metadata of each episode in a comprehensive format. After rigorous quality checks, GUI Odyssey includes 7,735 validated cross-app GUI navigation episodes. pipeline

📝 Statistics

Splits # Episodes # Unique Prompts # Avg. Steps Data location Model
Total 7,735 7,735 15.4 GUI-Odyssey OdysseyAgent
Train-Random & Test-Random 5,802 / 1,933 5,802 / 1,933 15.4 / 15.2 random_split.json OdysseyAgent-Random
Train-Task & Test-Task 6,719 / 1,016 6,719 / 1,016 15.0 / 17.6 task_split.json OdysseyAgent-Task
Train-Device & Test-Device 6,473 / 1,262 6,473 / 1,262 15.4 / 15.0 device_split.json OdysseyAgent-Device
Train-App & Test-App 6,596 / 1,139 6,596 / 1,139 15.4 / 15.3 app_split.json OdysseyAgent-App

💫 Dataset Access

The whole GUI Odyssey is hosted on Huggingface.

Clone the entire dataset from Huggingface:

git clone https://huggingface.co/datasets/OpenGVLab/GUI-Odyssey

And then move the cloned dataset into ./data directory. After that, the structure of ./data should look like this:

GUI-Odyssey
├── data
│   ├── annotations
│   │   └── *.json
│   ├── screenshots
│   │   └── data_*
│   │        └── *.png
│   ├── splits
│   │   ├── app_split.json
│   │   ├── device_split.json
│   │   ├── random_split.json
│   │   └── task_split.json
│   ├── format_converter.py
│   └── preprocessing.py
└── ...

Then organize the screenshots folder:

cd data
python preprocessing.py

Finally, the structure of ./data should look like this:

GUI-Odyssey
├── data
│   ├── annotations
│   │   └── *.json
│   ├── screenshots
│   │   └── *.png
│   ├── splits
│   │   ├── app_split.json
│   │   ├── device_split.json
│   │   ├── random_split.json
│   │   └── task_split.json
│   ├── format_converter.py
│   └── preprocessing.py
└── ...

⚙️ Detailed Data Information

Please refer to this.

🚀 Quick Start

Please refer to this to quick start.

📖 Release Process

  • Dataset
    • Screenshots of GUI Odyssey
    • annotations of GUI Odyssey
    • split files of GUI Odyssey
  • Code
    • data preprocessing code
    • inference code
  • Models

🖊️ Citation

If you feel GUI Odyssey useful in your project or research, please kindly use the following BibTeX entry to cite our paper. Thanks!

@misc{lu2024gui,
      title={GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices}, 
      author={Quanfeng Lu and Wenqi Shao and Zitao Liu and Fanqing Meng and Boxuan Li and Botong Chen and Siyuan Huang and Kaipeng Zhang and Yu Qiao and Ping Luo},
      year={2024},
      eprint={2406.08451},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

gui-odyssey's People

Contributors

lqf-hfnju avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gui-odyssey's Issues

Model checkpoint

Hi there,

Thank you for your wonderful work. Do you have any plans to publish the model ckpt recently?

Not fully reproduced

This is the result of my evaluation on the app aplit model。
image
It is lower than the indicator in the paper。What could be the reason?

请问可以提供training的脚本吗?

仓库里似乎只有finetune的脚本,而基于Qwen初始化权重,以及调整可训练参数并没有体现

请问如果我想要复现training的过程,将Qwen的权重加载到Odyssey模型对应位置,并参考论文中的参数冻结设置,这样的流程是否就够了

bug反馈

目前的仓库的代码并不能直接跑通(例如测试代码 evaluate_GUIOdyssey.py)
我遇到的问题的解决方法:

  1. 对于Qwen的tokenizer_qwen文件会报错IMAGE_ST相关错误,这个把super().__init__放到IMAGE_ST下面即可
  2. 本地加载文件的话,需要在modeling_qwen文件的41行替换为下面的
from .configuration_qwen import QWenConfig
from qwen_generation_utils import (
    HistoryType,
    make_context,
    decode_tokens,
    get_stop_words_ids,
    StopWordsLogitsProcessor,
)
from .visual import VisionTransformer
  1. 对于his_index的构建代码,也就是data目录下的format_converter.py,需要把104行改成test_anno_base,以及168行把data['history']改成data['history_screenshot']

另外就是his_index的建议,建议可以增加一个无history的处理方式比如单图输入的情况,而不是每次都需要his_index,挺麻烦的

Data release timeline

Hi there,

I really appreciate the high-quality GUI data introduced in this work. Could I ask if there is an estimate for the release time of the dataset?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.