I went through the readme on my Mac M2 and brew installed python3 and pip3. Then repla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The interactive prompt, when you run this: <div class="snippet-clipboard-content n

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to execute gpt4all from bash script or Node.js process instead of interactive prompt? about gpt4all HOT 7 CLOSED

lancejpollard commented on May 23, 2024 7

How to execute gpt4all from bash script or Node.js process instead of interactive prompt?

from gpt4all.

Comments (7)

jermwatt commented on May 23, 2024 2

@hawkeye-sama - yeah it would.

How about we spin up the model (and keep it running) via python's subprocess and write/read in via a subprocess handler? Then when we're done we terminate it. Something like this excellent description of repl control which I crudely adopt for our use case below.

Note: this would be run from the home dir of the repo, and I'm testing on a mac with M1.

import time, subprocess

# start up the process - keep it open.  Note the `cwd=chat` (run this from home dir)
def start(executable_file):
    return subprocess.Popen(
        executable_file,
        cwd="chat", 
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE)

# read inference output from subprocess handler
def read(process):
    return process.stdout.readline().decode("utf-8").strip()

# pass text to model via subprocess handler
def write(process, message):
    process.stdin.write(f"{message.strip()}\n".encode("utf-8"))
    process.stdin.flush()

# terminate model 
def terminate(process):
    process.stdin.close()
    process.terminate()
    process.wait(timeout=0.2)
    
# interact with model
def model_inf(process,text):
    # pass a test string to model
    write(process,text)

    # sleep while processes input - again a cheap hack
    time.sleep(3)

    # read and print model output via subprocess handler
    print(read(process))
    

## test the functionality above ##
process = start("./gpt4all-lora-quantized-OSX-m1")

# sleep while model loads - a cheap hack (we can do better)
time.sleep(5)

# read spinup output from model loading via subprocess handler
print(read(process))

# send text to model, report output
text = "what is the capital of france?"; print(text)
model_inf(process,text)

text = "what is the square root of 3?"; print(text)
model_inf(process,text)

## we could go on writing and reading... until at last 
# terminate via subprocess handler
terminate(process)

print out from running the above:

what is the capital of france?
> The Capital city of France is Paris, which has been its seat since 1368 AD when King Charles V moved his court there from Poitiers.


what is the square root of 3?
> The Square Root Of Three Is Nine (or Sixteen if you round to two decimal places). The number nine has been used as a placeholder for this answer since it's an integer that can be easily divided by three.

This could rolled into a bash script, or the pattern translated to js (not sure of equivalent to subprocess but sure it exists).

We might be able to roll something like this into a wrapper - pip installable package for the executable.

from gpt4all.

pratt3000 commented on May 23, 2024

What do mean by interactive prompt? I am facing the same issue rn.

from gpt4all.

khizarhussain19 commented on May 23, 2024

I am facing the exact same issue. Any luck ?

from gpt4all.

lancejpollard commented on May 23, 2024

The interactive prompt, when you run this:

cd chat;./gpt4all-lora-quantized-OSX-m1

It gets you into a REPL basically. I don't want the REPL, I want to call the program as a bash script, for example.

from gpt4all.

jermwatt commented on May 23, 2024

you can use the -p flag as illustrated in the example below (the output requires parsing):

./gpt4all-lora-quantized-OSX-m1 -p "What is the capital of France?"

response:

main: seed = 1680139956
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'gpt4all-lora-quantized.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


The current (2019) capital city of France, Paris
 [end of text]

from gpt4all.

hawkeye-sama commented on May 23, 2024

@jermwatt this would load the model each time wouldn't it? I think having the model preloaded in to memory and then processor ends up running the task when we do something like this as mentioned above?

gpt4all --prompt "List some dogs" > output.md

from gpt4all.

hawkeye-sama commented on May 23, 2024

Oh gotta try this. Thanks, probably would try to hook it up to an API and see how that goes.

from gpt4all.

How to execute gpt4all from bash script or Node.js process instead of interactive prompt? about gpt4all HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent