Code Monkey home page Code Monkey logo

memory-cache's People

Contributors

bharattech avatar johnshaughnessy avatar katetaylormoz avatar misslivirose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

memory-cache's Issues

Browser extension Settings/Configuration screen

MemoryCache browser extension mockup that includes a configuration page

^ Mockup that demonstrates a settings page addition to the browser extension

  • This change would mean we could reduce the amount of 'Save' buttons on the application's home view and use the user's preference for formatting the download
  • The 'Links here' is meant to be a catch-all for relevant information (to be added if we feel the need)
  • Open to suggestions for the label 'Local file location'

Create project landing page

We want to set up a single landing page using GH pages to outline the project goals, areas of explorations, and links to helpful resources / ways for people to follow along with Memory Cache development

Update to use GPU-accelerated hardware instead of CPU-bound with gpt4all

Memory Cache should use a GPU that is available to do inference in order to speed up performance of queries and deriving insights from documents.

What I tried so far

I spent a few days last week exploring the differences between the primordial privateGPT version and latest. One of the major differences is that the newer project updates include support for GPU inference for llama and gpt4all, but the challenge that I ran into with the newer version is that moving from the older groovy.ggml model (which is no longer supported given that privateGPT now uses the .gguf format) to llama doesn't have the same results when ingesting the same local file store and querying.

This might be a matter of how RAG is implemented, something about how I set things up on my local machine, or a function of model choice.

I've lazily tried to see if this can be resolved through dependency changes but I haven't had luck getting to a version that runs that supports .ggml and GPU acceleration together. From what I can tell, Nomic introduced a version of gpt4all that works on GPU in 2.4 (latest is 2.5+) but it's unclear if there's a way to get this working cleanly with minimal changes to how my fork of privateGPT uses langchain to import the gpt4all package. It is unclear to me if this works on Ubuntu or if it's only Vulkan APIs, I need to do some additional investigation.

I did get CUDA installed and verified that my GPU is properly detected and set up to run the sample projects provided by Nvidia.

What's next

  • I'm going to test against gpt4all's chat client with snoozy (which uses the same dataset as groovy) and the shared file directory, but there seems to be a sweet spot for the combination of primordial privateGPT + groovy that is challenging to replicate.
  • Branch and start experimenting with upgrading gpt4all and langchain in the primoridal privateGPT repo to see if I can get any of it running with the existing groovy.ggml model
  • Attempt to convert groovy from ggml to gguf using the llama.cpp utility and try to switch from gpt4all to llama, which might be easier than trying to get a proper CUDA-backed gpt4all working.

Testing

I've been using a highly subjective test to evaluate:

Prompt: "What is the meaning of a life well-lived?"

The answer for primordial privateGPT+groovy that has been augmented on my local files answers this question with a combination of "technology and community" consistently. No other combination of model/project has replicated that consistently.

Investigate feasibility of making this work in vanilla Firefox

When implementing the feature to allow saving notes as quick text, I was able to do this with the existing downloads API and using a blob to store the text. Is it possible to do this for the entirety of the HTML contents of the webpage, so that it could work out of the box with vanilla Firefox instead of a custom build?

Feature Idea: Ingest from Bookmarks folder

The current setup needing to patch Firefox and add an extension got me thinking about other ways to get data from the browser.

I knew it's relatively easy to read the places.sqlite file of a profile (containing the browsing history and bookmarks), so I got the following idea I would like to submit:

The user could create a specific bookmark folder, and all the bookmarks put in it would be automatically ingested into the document DB.

That way no special setup would be needed on the browser to use Memory-Cache with it.

I implemented a PoC in the branch ingest-bookmark of my privateGPT fork. This branch adds the ingest_bookmarks.py script, that needs an environment variable BOOKMARK_FOLDER to be defined.
It reads the content of the user default profile to get all the bookmarks of this folder, fetch the page content for each bookmark and ingest them into the docs database.

For this PoC the script currently needs to be manually run, and will re-import all the bookmarks each time. It can be improved to only ingest new bookmarks, to allow overriding the selected user profile and so on.

Add option to save text notes as .md files

Right now, the extension is just saving everything as a .txt file. A suggestion was made by @zfox23 to make this support markdown instead of just .txt. It could be added as an optional parameter into the (as of yet non-existent) settings page to choose what format to save notes as.

Dont force this bloat on users, Mozilla. Not Pocket 2.0

Just more bloatware for Mozilla to shove into Firefox just like Pocket.

No one wants these random AIs, they just want to open a web browser and type chat.openai.com and talk there. We dont want a goddamn AI in every app and service we use.

End this madness please, dont force it upon me in firefox where I must use more scripts to remove the bloat.

Persist popup on Firefox extension

If you clear focus from the extension window at the moment, it will disappear. Persist the state of the extension (either when there is text in the notes box, or until the tab is closed) to avoid data loss.

Add a text input field into plugin to quickly save notes

Users may want to add their own annotation (or even just quickly take a separate note) to save in their memory cache. We should add a text input field into the Firefox extension to generate a .txt file that saves into the directory.

Feature Idea: RSS Feed Integration

I've been musing on the idea of finding a way to incorporate RSS into MemoryCache. It feels like it might be an interesting way to provide additional publishing sources that people might want to draw from, but at the same time, I'm torn because I think there's a risk of RSS content pushing a lot of things into the data store that never actually gets read by the human, which feels like a large factor to me. Something we might want to consider (cc @katetaylormoz) is whether we want to have an optional separate flow for user-added sources, which have to be "read" before being saved?

Alternatively, the solution might just be to recommend an RSS feed extension separately, and use the existing document flow.

Investigate ability to save downloads on one machine to later synchronize when back on home network

Scenario: I'm at an offsite and want to save some documents to later be shared with and ingested by my local machine that serves as my main AI workstation.

Some thinking:

  • I don't want the solution to be to move my pages to the cloud, but to have a way of setting up a sync locally on my home network
  • Ideally I can set this up without an account-based solution, but some other form of authenticating (maybe per-device?)

Add listener to run_ingest.sh to listen to file changes in sylinked directory

The current script only listens for modifications to the downloads folder, which means that if you manually add a file to a subdirectory in source_documents, it won't trigger the ingest script to run.

This isn't a huge issue, because it's going to pick it up in the next saved document anyway, but it might be good to have a flag to "listen" more intently to when new files are added, regardless of how they make their way into the cache.

File names - Issue with 'Save as PDF' functionality

Issue:
File name defaults to 'PAGE-[object Promise]' for every PDF saved

Preferred Behaviour (or similar):
PAGE-Carl-Sagan---Wikiquote-2024-01-23.html
PAGE-Carl-Sagan---Wikiquote-2024-01-23.pdf

Notes:
Priority is human-readable filename,
Nice-To-Have would be html and pdf are the same name with different formats

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.