autonomys / aingenuity Goto Github PK
View Code? Open in Web Editor NEWunnamed experiment
unnamed experiment
Default pointer (mouse cursor, finger pointer, etc) interactions are currently not disabled, so sometime a drag and drop action will happen when dragging somewhere in the app and will break the camera looking behavior.
We need to disable default behaviors (enable them when/if we need them) because they are designed for default web experiences that revolve around 2D text and images (f.e. a default behavior is drag an image or text to address bar to open it directly or drop in file explorer, etc).
Use CSS touch-action
to control this, and perhaps pointer-events:none
on elements that should have no interaction.
There are kinda two entry points, main.js with its own UI added as additional to Sumerian stuff, and AutoApp.js with the adapted and modified Sumerian code including all of the Sumerian UI (f.e. the loading screen, and the text inputs which are currently hidden).
This is quite ugly, instead we should coalesce into a single easier-to-manage entry point without using global variables to share things across both.
Right now pressing WASD keys simply increments the position.x/z
values of the <lume-perspective-camera>
element, which results in movement that isn't smooth, but more abrupt.
To make it smoother, we can increment the values of a target position object, then run a linear interpolation of the camera position towards that target on every animation frame.
DOM/CSS UI is not currently able to transcend into WebXR space, so we need to come up with method for UI inside WebGL. One of these approaches:
right now we can walk through walls
Here's a simple approach we could take using basic physics:
Rather than copy/pasting a build file into the repo, bring in AWS Sumerian code into the repo and own it.
Then we can update Three.js and make sure it matches between both the Sumerian code and Lume, etc.
This will allow us to develop further features in an easy way.
Optionally, rather than bringing in the code, we could keep it in our fork of the sumerian repo that we manage in our own way, so we can easily update it and consume it.
WIP:
amazon-sumerian-hosts
.amazon-sumerian-hosts
code until we fix this one issue with Babylon (even though we're not currently using Babylon, but with due time will make this high usable with the main three web rendering systems)We currently use AWS APIs on the client side, exposing our key on the client which means anyone can take it.
We need to
First we want to complete
so that we're in control of dependencies and all of the code (instead of relying on AWS APIs being embedded into a client-side bundle).
Enabling VR mode is straight forward with Three.js, but for testing it will require (for me at least)
because I am not able to get Quest developer mode working in order to use localhost URLs. So I'll consider this issue done once I can actually test it online in lieu of not having localhost access on Oculus.
Pressing WASD keys moves the camera on world X/Z axes, but instead WASD should move the camera relative to the direction the camera is looking (f.e. W should always move in the direction the camera is facing).
A small amount of trig is needed.
Some people (f.e. me) may be unable to get Oculus Quest developer mode working, and unable to inspect web contexts with Chrome devtools from a desktop/laptop.
To work around this, a dev deployment (f.e. on nodechef) will be useful as a way to access the app over HTTPS if not localhost (developer mode is needed to access localhost from desktop/laptop on the headset), considering that the WebXR API works only on HTTPS or localhost URLs (a security limitation of WebXR API in browsers).
This is WIP, currently holding Shift makes the camera keep going downward, but instead we want to go to a lower height while holding Shift, then go back to normal height when we let go.
When the user hits the record button, show the text in the UI. We can show it within the input field, and disable the input field while user is talking, much like in this demo.
The backend already gets the speech-to-text result, and sends that off to eventually get a response from openai. We just need to show that text in the UI.
The record button currently records for 4 seconds, then we get a whole text result (unlike the above demo that streams the text piece by piece). For now, we can just show the whole text result at once, to start off with. Then, after
we will likely have the ability to stream the text as the user talks to make it more realtime.
Right now, openai receives a prompt, send back a response, and does not remember any conversation context
F.e. add a desk or table that the character is standing or sitting behind. Perhaps add room around the scene so it feels like we are inside a house, or something. For now we can use free 3D models from sites like SketchFab.
Here's a Three.js example with a character sitting in a room for some inspiration:
https://threejs.org/examples/?q=skin#webgl_animation_skinning_ik
<lume-gltf-model src="....">
elements to load modelsThe record button currently records for strictly 4 seconds. Instead, we want to be able to stop recording once no more speech is detected for some amount of time. This will require making some changes to how the Web Audio objects are managed, and I'm not sure exactly how to know when no more speech is happening (Does Web Audio provide some way? Do we need to implement audio processing ourselves? Or is there a lib?).
As an alternative, we may want to simply show a countdown timer (f.e. a path along a circle that contracts smaller and smaller around the circle), so the user has a constant-but-known time limit to say responses.
Aiming to surpass or at least meet the level of polish in this chat demo, improve the chat input, submit button, and record button, and chat message list.
Demo screenshot:
Our current app:
We don't need to copy the demo, we just need to get it to a similar or better level of polish. Having our own style is better. :)
Notice in the inspiration demo that the record button is not near the chat. Use your discretion for where to place the microphone/record button, feel free to modify the layout if desired, etc. The main goal is to take inspiration from the level of polish, but not literally copy it.
We should include a command to kill the application, so that the user doesn't run 2 copies of Meteor in the same application directory.
We'll wire up the Meteor backend so that each logged in player broadcasts player state, and we show an "avatar" of each player (multiplayer). For now, we'll show a simple cube for each player's head.
cc @ryyeung
This will be useful for any future meetings as we expand team. Happy to help and get this started.
I tried to login to meteor on the official website with the credentials created in unnamed-experiment without success. I wanted to check what has been updated there
For example, after we are able to save conversations to backend, make it so multiple people in the same VR experience can send messages to the same AI
Requires:
use event.setPointerCapture()
We will replace AWS and OpenAI stuff with it, f.e. Sapial, Humaic, etc.
cc @rg3l3dr
After completing
use Meteor's reactive database API to store messages and update the UI with a list of chat message, for now focused only on text input messages.
so that we can automatically send speech when the user has finished speaking.
Or, perhaps make it be push-to-talk so that when the user lets go, we know when to send.
Right now the record button records for a hard-coded 4 seconds, and then sends.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.