Comments (5)
@shubhexists I'll close this now that this is implemented as a standard part of the project
from self-operating-computer.
As far as I can think, there might be 2 implementations for this -
- Change the promt asking it to use cmd + L for navigating to the search bar directly...
- Change the promt to detect if it is a browser, and if it is a browser.. use
pyautogui
to press cmd + L
Whichever would be more accurate, idk..
from self-operating-computer.
@shubhexists Based on the wording of this section in the README:
We recognize that some operating system functions may be more efficiently executed with hotkeys such as entering the Browser Address bar using command + L rather than by simulating a mouse click at the correct XY location. We plan to make these improvements over time. However, it's important to note that many actions require the accurate selection of visual elements on the screen, necessitating precise XY mouse click locations. A primary focus of this project is to refine the accuracy of determining these click locations. We believe this is essential for achieving a fully self-operating computer in the current technological landscape.
It sounds like the primary vision of the project at the moment is to improve click accuracy. Something that the cursor will likely be doing a lot in this program is moving to the navigation bar in the browser. That is likely why cmd+L / ctrl+L hasn't yet been implemented.
from self-operating-computer.
Fine, Makes sense :/ We can not run away from the fact that accuracy is more important. These features can be implemented later...
from self-operating-computer.
I think that #8 essentially handles this by create a "command" key system for the prompt. I think this makes sense long term. The goal of this project is to allow multi-modal models to most exactly emulate the humans interaction with the computer. I still need to review #8
from self-operating-computer.
Related Issues (20)
- [Question] About the Third-party API
- [FEATURE] No update instructions?
- [BUG] WINDOWS install not finding gpt-4-with-ocr HOT 5
- [BUG] Unable to activate the virtual environment
- [BUG] Not running on Ubuntu 22.04.4 LTS HOT 3
- CogVLM Support - A better LLaVa
- [BUG] -m gemini-pro-vision asking for OPENAI_API_KEY
- [FEATURE] Add Remote Ollama Capability
- [BUG] Cannot seem to select the right emails to delete.
- [FEATURE] Learning Process HOT 1
- [FEATURE] GUI Interface and further connectivity
- [BUG] operate -m llava return error local variable 'content' referenced before assignment
- [BUG] ModuleNotFoundError: No module named 'pkg_resources' HOT 4
- [BUG] Need GPT-4 ?
- [FEATURE] Azure open AI support HOT 2
- OpenSource free Vision model use Instead of openAI HOT 5
- Github
- [Linux]: X get_image failed: error 8 (73, 0, 1316) [Error] --> cannot access local variable 'content' where it is not associated with a value HOT 2
- [For be deleted]
- [BUG] No such file or directory Xauthority
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from self-operating-computer.