Code Monkey home page Code Monkey logo

aingenuity's People

Contributors

ryyeung avatar trusktr avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aingenuity's Issues

Ensure default pointer actions (f.e. drag and drop) do not interfere with scene interaction

Default pointer (mouse cursor, finger pointer, etc) interactions are currently not disabled, so sometime a drag and drop action will happen when dragging somewhere in the app and will break the camera looking behavior.

We need to disable default behaviors (enable them when/if we need them) because they are designed for default web experiences that revolve around 2D text and images (f.e. a default behavior is drag an image or text to address bar to open it directly or drop in file explorer, etc).

Use CSS touch-action to control this, and perhaps pointer-events:none on elements that should have no interaction.

clean up the entry-point of the app, currently mixed between our own, and Sumerian's

There are kinda two entry points, main.js with its own UI added as additional to Sumerian stuff, and AutoApp.js with the adapted and modified Sumerian code including all of the Sumerian UI (f.e. the loading screen, and the text inputs which are currently hidden).

This is quite ugly, instead we should coalesce into a single easier-to-manage entry point without using global variables to share things across both.

add interpolation to WASD controls to make movement nice and smooth

Right now pressing WASD keys simply increments the position.x/z values of the <lume-perspective-camera> element, which results in movement that isn't smooth, but more abrupt.

To make it smoother, we can increment the values of a target position object, then run a linear interpolation of the camera position towards that target on every animation frame.

convert from regular DOM/CSS UI to WebGL-powered UI so that we can render it inside WebXR.

DOM/CSS UI is not currently able to transcend into WebXR space, so we need to come up with method for UI inside WebGL. One of these approaches:

  1. Render basic DOM/CSS to canvas using the SVG trick, which has limitations on what CSS features are supported, and a bit of performance cost every frame.
    • This might be the simplest way to support a wide set of UI capabilities already built into the browser, to leverage it in WebXR.
  2. Create a UI system out of WebGL primitives
    • This is perhaps more complicated than option 1, but will eventually yield better results: full control over rendering inside of WebXR without limitations, and more performance.
    • We can to re-implement things like text flow, font rendering, etc (for which some libraries for Three.js already exists).

take over AWS Sumerian code so we can manage dependencies, update Three.js, to enable making new features in a scalable way.

Rather than copy/pasting a build file into the repo, bring in AWS Sumerian code into the repo and own it.

Then we can update Three.js and make sure it matches between both the Sumerian code and Lume, etc.

This will allow us to develop further features in an easy way.

Optionally, rather than bringing in the code, we could keep it in our fork of the sumerian repo that we manage in our own way, so we can easily update it and consume it.

WIP:

  • aws-samples/amazon-sumerian-hosts#182 -- This greatly reduces maintenance burden for amazon-sumerian-hosts.
  • aws-samples/amazon-sumerian-hosts#155 -- For completeness sake, we can't own the amazon-sumerian-hosts code until we fix this one issue with Babylon (even though we're not currently using Babylon, but with due time will make this high usable with the main three web rendering systems)

Make the AWS key not public (get a new key!) and use aws on the backend instead of in the client.

We currently use AWS APIs on the client side, exposing our key on the client which means anyone can take it.

We need to

  • move AWS API usage to the backend
  • then make a new key that is not visible on the client (tedious process that involves multiple steps to wire up multiple AWS services before finally being able to get an Identity key for the setup)

First we want to complete

so that we're in control of dependencies and all of the code (instead of relying on AWS APIs being embedded into a client-side bundle).

Enable VR mode

Enabling VR mode is straight forward with Three.js, but for testing it will require (for me at least)

because I am not able to get Quest developer mode working in order to use localhost URLs. So I'll consider this issue done once I can actually test it online in lieu of not having localhost access on Oculus.

make a deployment for dev testing online

Some people (f.e. me) may be unable to get Oculus Quest developer mode working, and unable to inspect web contexts with Chrome devtools from a desktop/laptop.

To work around this, a dev deployment (f.e. on nodechef) will be useful as a way to access the app over HTTPS if not localhost (developer mode is needed to access localhost from desktop/laptop on the headset), considering that the WebXR API works only on HTTPS or localhost URLs (a security limitation of WebXR API in browsers).

Add crouch controls

This is WIP, currently holding Shift makes the camera keep going downward, but instead we want to go to a lower height while holding Shift, then go back to normal height when we let go.

show user speech text in the UI

When the user hits the record button, show the text in the UI. We can show it within the input field, and disable the input field while user is talking, much like in this demo.

The backend already gets the speech-to-text result, and sends that off to eventually get a response from openai. We just need to show that text in the UI.

The record button currently records for 4 seconds, then we get a whole text result (unlike the above demo that streams the text piece by piece). For now, we can just show the whole text result at once, to start off with. Then, after

we will likely have the ability to stream the text as the user talks to make it more realtime.

Make the scene a little more interesting.

F.e. add a desk or table that the character is standing or sitting behind. Perhaps add room around the scene so it feels like we are inside a house, or something. For now we can use free 3D models from sites like SketchFab.

Here's a Three.js example with a character sitting in a room for some inspiration:

https://threejs.org/examples/?q=skin#webgl_animation_skinning_ik

  • use GLTF as much as possible (the most standard and open 3D format)
  • use <lume-gltf-model src="...."> elements to load models

Make the record button finish recording when done talking for an amount of time, rather than strictly after 4 seconds.

The record button currently records for strictly 4 seconds. Instead, we want to be able to stop recording once no more speech is detected for some amount of time. This will require making some changes to how the Web Audio objects are managed, and I'm not sure exactly how to know when no more speech is happening (Does Web Audio provide some way? Do we need to implement audio processing ourselves? Or is there a lib?).

As an alternative, we may want to simply show a countdown timer (f.e. a path along a circle that contracts smaller and smaller around the circle), so the user has a constant-but-known time limit to say responses.

Improve chat input field, use a triangle/send icon for submit button, use a microphone/record icon for record button

Aiming to surpass or at least meet the level of polish in this chat demo, improve the chat input, submit button, and record button, and chat message list.

Demo screenshot:

Screenshot 2023-08-24 at 11 50 42 AM

Our current app:

Screenshot 2023-10-20 at 12 05 59 AM

We don't need to copy the demo, we just need to get it to a similar or better level of polish. Having our own style is better. :)

Notice in the inspiration demo that the record button is not near the chat. Use your discretion for where to place the microphone/record button, feel free to modify the layout if desired, etc. The main goal is to take inspiration from the level of polish, but not literally copy it.

meteor auth

I tried to login to meteor on the official website with the credentials created in unnamed-experiment without success. I wanted to check what has been updated there

Implement end-of-speech detection

so that we can automatically send speech when the user has finished speaking.

Or, perhaps make it be push-to-talk so that when the user lets go, we know when to send.

Right now the record button records for a hard-coded 4 seconds, and then sends.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.