autonomys / aingenuity Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 3.0 93.54 MB

unnamed experiment

CSS 0.25% HTML 6.04% JavaScript 93.71%

aingenuity's People

Contributors

Stargazers

Watchers

Forkers

emilfattakhov autonomys-nodechef trusktr

aingenuity's Issues

Use Meteor's database reactivity to store messages and show them in a sequence in the UI

After completing

#3 (see demo)

use Meteor's reactive database API to store messages and update the UI with a list of chat message, for now focused only on text input messages.

take over AWS Sumerian code so we can manage dependencies, update Three.js, to enable making new features in a scalable way.

Rather than copy/pasting a build file into the repo, bring in AWS Sumerian code into the repo and own it.

Then we can update Three.js and make sure it matches between both the Sumerian code and Lume, etc.

This will allow us to develop further features in an easy way.

Optionally, rather than bringing in the code, we could keep it in our fork of the sumerian repo that we manage in our own way, so we can easily update it and consume it.

WIP:

aws-samples/amazon-sumerian-hosts#182 -- This greatly reduces maintenance burden for amazon-sumerian-hosts.
aws-samples/amazon-sumerian-hosts#155 -- For completeness sake, we can't own the amazon-sumerian-hosts code until we fix this one issue with Babylon (even though we're not currently using Babylon, but with due time will make this high usable with the main three web rendering systems)

how to get API keys

https://github.com/organizations/autonomys/settings/variables/actions is 404

Come up with a basic multiplayer scenario

For example, after we are able to save conversations to backend, make it so multiple people in the same VR experience can send messages to the same AI

Requires:

Write a list of things we need from our own/Sapial backend

We will replace AWS and OpenAI stuff with it, f.e. Sapial, Humaic, etc.

cc @rg3l3dr

Enable VR mode

Enabling VR mode is straight forward with Three.js, but for testing it will require (for me at least)

because I am not able to get Quest developer mode working in order to use localhost URLs. So I'll consider this issue done once I can actually test it online in lieu of not having localhost access on Oculus.

Sync player state (rotation/position) between clients

We'll wire up the Meteor backend so that each logged in player broadcasts player state, and we show an "avatar" of each player (multiplayer). For now, we'll show a simple cube for each player's head.

Ensure default pointer actions (f.e. drag and drop) do not interfere with scene interaction

Default pointer (mouse cursor, finger pointer, etc) interactions are currently not disabled, so sometime a drag and drop action will happen when dragging somewhere in the app and will break the camera looking behavior.

We need to disable default behaviors (enable them when/if we need them) because they are designed for default web experiences that revolve around 2D text and images (f.e. a default behavior is drag an image or text to address bar to open it directly or drop in file explorer, etc).

Use CSS touch-action to control this, and perhaps pointer-events:none on elements that should have no interaction.

Add Jump Animations

Capture the pointer during drag, so that dragging outside of the window and letting go does not ruin the camera look functionality

use event.setPointerCapture()

clean up the entry-point of the app, currently mixed between our own, and Sumerian's

There are kinda two entry points, main.js with its own UI added as additional to Sumerian stuff, and AutoApp.js with the adapted and modified Sumerian code including all of the Sumerian UI (f.e. the loading screen, and the text inputs which are currently hidden).

This is quite ugly, instead we should coalesce into a single easier-to-manage entry point without using global variables to share things across both.

make WASD controls move in screen space directions, not world space directions

Pressing WASD keys moves the camera on world X/Z axes, but instead WASD should move the camera relative to the direction the camera is looking (f.e. W should always move in the direction the camera is facing).

A small amount of trig is needed.

Add crouch controls

This is WIP, currently holding Shift makes the camera keep going downward, but instead we want to go to a lower height while holding Shift, then go back to normal height when we let go.

Make the scene a little more interesting.

F.e. add a desk or table that the character is standing or sitting behind. Perhaps add room around the scene so it feels like we are inside a house, or something. For now we can use free 3D models from sites like SketchFab.

Here's a Three.js example with a character sitting in a room for some inspiration:

https://threejs.org/examples/?q=skin#webgl_animation_skinning_ik

use GLTF as much as possible (the most standard and open 3D format)
use <lume-gltf-model src="...."> elements to load models

meteor auth

I tried to login to meteor on the official website with the credentials created in unnamed-experiment without success. I wanted to check what has been updated there

limit user movement by the available room space

right now we can walk through walls

Here's a simple approach we could take using basic physics:

https://threejs.org/examples/#games_fps

convert from regular DOM/CSS UI to WebGL-powered UI so that we can render it inside WebXR.

DOM/CSS UI is not currently able to transcend into WebXR space, so we need to come up with method for UI inside WebGL. One of these approaches:

Render basic DOM/CSS to canvas using the SVG trick, which has limitations on what CSS features are supported, and a bit of performance cost every frame.
- This might be the simplest way to support a wide set of UI capabilities already built into the browser, to leverage it in WebXR.
Create a UI system out of WebGL primitives
- This is perhaps more complicated than option 1, but will eventually yield better results: full control over rendering inside of WebXR without limitations, and more performance.
- We can to re-implement things like text flow, font rendering, etc (for which some libraries for Three.js already exists).

Ensure that when focused on text inputs (etc), camera does not move while typing

Add jumping controls

cc @ryyeung

Improve chat input field, use a triangle/send icon for submit button, use a microphone/record icon for record button

Aiming to surpass or at least meet the level of polish in this chat demo, improve the chat input, submit button, and record button, and chat message list.

Demo screenshot:

Our current app:

We don't need to copy the demo, we just need to get it to a similar or better level of polish. Having our own style is better. :)

Notice in the inspiration demo that the record button is not near the chat. Use your discretion for where to place the microphone/record button, feel free to modify the layout if desired, etc. The main goal is to take inspiration from the level of polish, but not literally copy it.

README update to include lsof -i :3000, and kill -9 PID

We should include a command to kill the application, so that the user doesn't run 2 copies of Meteor in the same application directory.

improve login/chat UI so that it is not awkwardly bunched at the top right.

~~keep the login stuff on the top left, but put the chat UI (currently just a form field, record button, and response text) on the right side~~ (approaching similarity to this closed-source demo). We instead kept the login and chat on the left side together, but improved the layout.
Add a submit button next to the input field as alternative to pressing enter

make a deployment for dev testing online

Some people (f.e. me) may be unable to get Oculus Quest developer mode working, and unable to inspect web contexts with Chrome devtools from a desktop/laptop.

To work around this, a dev deployment (f.e. on nodechef) will be useful as a way to access the app over HTTPS if not localhost (developer mode is needed to access localhost from desktop/laptop on the headset), considering that the WebXR API works only on HTTPS or localhost URLs (a security limitation of WebXR API in browsers).

show user speech text in the UI

When the user hits the record button, show the text in the UI. We can show it within the input field, and disable the input field while user is talking, much like in this demo.

The backend already gets the speech-to-text result, and sends that off to eventually get a response from openai. We just need to show that text in the UI.

The record button currently records for 4 seconds, then we get a whole text result (unlike the above demo that streams the text piece by piece). For now, we can just show the whole text result at once, to start off with. Then, after

we will likely have the ability to stream the text as the user talks to make it more realtime.

Write a notion doc that quickly summarizes Auto's mission and our current progress/game plan

This will be useful for any future meetings as we expand team. Happy to help and get this started.

add interpolation to WASD controls to make movement nice and smooth

Right now pressing WASD keys simply increments the position.x/z values of the <lume-perspective-camera> element, which results in movement that isn't smooth, but more abrupt.

To make it smoother, we can increment the values of a target position object, then run a linear interpolation of the camera position towards that target on every animation frame.

Make the AWS key not public (get a new key!) and use aws on the backend instead of in the client.

We currently use AWS APIs on the client side, exposing our key on the client which means anyone can take it.

We need to

move AWS API usage to the backend
then make a new key that is not visible on the client (tedious process that involves multiple steps to wire up multiple AWS services before finally being able to get an Identity key for the setup)

First we want to complete

so that we're in control of dependencies and all of the code (instead of relying on AWS APIs being embedded into a client-side bundle).

Implement end-of-speech detection

so that we can automatically send speech when the user has finished speaking.

Or, perhaps make it be push-to-talk so that when the user lets go, we know when to send.

Right now the record button records for a hard-coded 4 seconds, and then sends.

Update openai code on backend to remember conversation context

Right now, openai receives a prompt, send back a response, and does not remember any conversation context

Write docs on how to create the Sumerian API keys on AWS

Make the record button finish recording when done talking for an amount of time, rather than strictly after 4 seconds.

The record button currently records for strictly 4 seconds. Instead, we want to be able to stop recording once no more speech is detected for some amount of time. This will require making some changes to how the Web Audio objects are managed, and I'm not sure exactly how to know when no more speech is happening (Does Web Audio provide some way? Do we need to implement audio processing ourselves? Or is there a lib?).

As an alternative, we may want to simply show a countdown timer (f.e. a path along a circle that contracts smaller and smaller around the circle), so the user has a constant-but-known time limit to say responses.