Code Monkey home page Code Monkey logo

general-purpose-ar-task-guidance's Introduction

TAGGAR: General-Purpose Task Guidance from Natural Language in Augmented Reality using Vision-Language Models

  • Unity code running on HoloLens 2 is located in the Assets/Scripts directory
  • Python code running on server is located in the object_detection_scripts directory
  • This research done under Dr. Doug Bowman at Virginia Tech's 3D Interaction Group, as part of my Master's Thesis.
  • Video demo (2 minutes) of our system is attached to publication link

Abstract: Augmented reality (AR) task guidance systems provide assistance for procedural tasks by rendering virtual guidance visuals within the real-world environment. Current AR task guidance systems are limited in that they require AR system experts to manually place visuals, require models of real-world objects, or only function for limited tasks or environments. We propose a general-purpose AR task guidance approach for tasks defined by natural language. Our approach allows an operator to take pictures of relevant objects and write task instructions for an end user, which are used by the system to determine where to place guidance visuals. Then, an end user can receive and follow guidance even if objects change locations or environments. Our approach utilizes current vision-language machine learning models for text and image semantic understanding and object localization. We built a proof-of-concept system called TAGGAR using our approach and tested its accuracy and usability in a user study. We found that all operators were able to generate clear guidance for tasks and end users were able to follow the guidance visuals to complete the expected action 85.7% of the time without any knowledge of the tasks.

System Overview

operator_and_user_intro_diagram.pdf

System Diagram

system_design_diagram.pdf

general-purpose-ar-task-guidance's People

Contributors

dstovervt avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.