Code Monkey home page Code Monkey logo

Comments (5)

joshbickett avatar joshbickett commented on May 13, 2024 1

@shubhexists I'll close this now that this is implemented as a standard part of the project

from self-operating-computer.

shubhexists avatar shubhexists commented on May 13, 2024

As far as I can think, there might be 2 implementations for this -

  1. Change the promt asking it to use cmd + L for navigating to the search bar directly...
  2. Change the promt to detect if it is a browser, and if it is a browser.. use pyautogui to press cmd + L

Whichever would be more accurate, idk..

from self-operating-computer.

michaelhhogue avatar michaelhhogue commented on May 13, 2024

@shubhexists Based on the wording of this section in the README:

We recognize that some operating system functions may be more efficiently executed with hotkeys such as entering the Browser Address bar using command + L rather than by simulating a mouse click at the correct XY location. We plan to make these improvements over time. However, it's important to note that many actions require the accurate selection of visual elements on the screen, necessitating precise XY mouse click locations. A primary focus of this project is to refine the accuracy of determining these click locations. We believe this is essential for achieving a fully self-operating computer in the current technological landscape.

It sounds like the primary vision of the project at the moment is to improve click accuracy. Something that the cursor will likely be doing a lot in this program is moving to the navigation bar in the browser. That is likely why cmd+L / ctrl+L hasn't yet been implemented.

from self-operating-computer.

shubhexists avatar shubhexists commented on May 13, 2024

Fine, Makes sense :/ We can not run away from the fact that accuracy is more important. These features can be implemented later...

from self-operating-computer.

joshbickett avatar joshbickett commented on May 13, 2024

I think that #8 essentially handles this by create a "command" key system for the prompt. I think this makes sense long term. The goal of this project is to allow multi-modal models to most exactly emulate the humans interaction with the computer. I still need to review #8

from self-operating-computer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.