Comments (7)
@hawkeye-sama - yeah it would.
How about we spin up the model (and keep it running) via python's subprocess
and write/read in via a subprocess
handler? Then when we're done we terminate it. Something like this excellent description of repl control which I crudely adopt for our use case below.
Note: this would be run from the home dir of the repo, and I'm testing on a mac with M1.
import time, subprocess
# start up the process - keep it open. Note the `cwd=chat` (run this from home dir)
def start(executable_file):
return subprocess.Popen(
executable_file,
cwd="chat",
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# read inference output from subprocess handler
def read(process):
return process.stdout.readline().decode("utf-8").strip()
# pass text to model via subprocess handler
def write(process, message):
process.stdin.write(f"{message.strip()}\n".encode("utf-8"))
process.stdin.flush()
# terminate model
def terminate(process):
process.stdin.close()
process.terminate()
process.wait(timeout=0.2)
# interact with model
def model_inf(process,text):
# pass a test string to model
write(process,text)
# sleep while processes input - again a cheap hack
time.sleep(3)
# read and print model output via subprocess handler
print(read(process))
## test the functionality above ##
process = start("./gpt4all-lora-quantized-OSX-m1")
# sleep while model loads - a cheap hack (we can do better)
time.sleep(5)
# read spinup output from model loading via subprocess handler
print(read(process))
# send text to model, report output
text = "what is the capital of france?"; print(text)
model_inf(process,text)
text = "what is the square root of 3?"; print(text)
model_inf(process,text)
## we could go on writing and reading... until at last
# terminate via subprocess handler
terminate(process)
print out from running the above:
what is the capital of france?
> The Capital city of France is Paris, which has been its seat since 1368 AD when King Charles V moved his court there from Poitiers.
what is the square root of 3?
> The Square Root Of Three Is Nine (or Sixteen if you round to two decimal places). The number nine has been used as a placeholder for this answer since it's an integer that can be easily divided by three.
This could rolled into a bash script, or the pattern translated to js (not sure of equivalent to subprocess
but sure it exists).
We might be able to roll something like this into a wrapper - pip installable package for the executable.
from gpt4all.
What do mean by interactive prompt? I am facing the same issue rn.
from gpt4all.
I am facing the exact same issue. Any luck ?
from gpt4all.
The interactive prompt, when you run this:
cd chat;./gpt4all-lora-quantized-OSX-m1
It gets you into a REPL basically. I don't want the REPL, I want to call the program as a bash script, for example.
from gpt4all.
you can use the -p
flag as illustrated in the example below (the output requires parsing):
./gpt4all-lora-quantized-OSX-m1 -p "What is the capital of France?"
response:
main: seed = 1680139956
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size = 2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'gpt4all-lora-quantized.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
The current (2019) capital city of France, Paris
[end of text]
from gpt4all.
@jermwatt this would load the model each time wouldn't it? I think having the model preloaded in to memory and then processor ends up running the task when we do something like this as mentioned above?
gpt4all --prompt "List some dogs" > output.md
from gpt4all.
Oh gotta try this. Thanks, probably would try to hook it up to an API and see how that goes.
from gpt4all.
Related Issues (20)
- [Feature] add a button to duplicate a chat and use it as a starting context for new chats
- [Feature] Support old MPT GGUF conversions with duplicated output tensor HOT 1
- C# Bindings need updating HOT 1
- Can't install HOT 2
- [Feature] Please Add Option for Llama 3 70B Parameters HOT 3
- [Feature] Support for GPT 4 Turbo
- UI: If you have too many installed models, the list gets cut off and can't be scrolled
- [Feature] Ctrl+F to search text inside a discussion
- bug
- v2.7.5 Windows Local and Server Model both use Llama 3 Instruct, program crash HOT 1
- [Feature] indicate the max context size of each model in the download list ?
- [Feature] check the compatibility of a hugging face model before fully downloading it ?
- Idk what this is honestly HOT 1
- Python Bindings: Model no longer kept in cache HOT 2
- Reliable crash test in 2.7.5 and 2.8.0pre1 HOT 3
- Python bindings: add possibility to clear history of a chat_session HOT 4
- "availableGPUDevices: built without Kompute" error when installed via pip on macOS M2 HOT 2
- [Feature] Ability to populate model._history when using chat_session() HOT 6
- 增加对Intel ARC A770显卡推理支持 HOT 1
- Ver. 2.7.4 nad Ver. 2.8.0 pre not starting gui on Windows HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpt4all.