baranzinilab / kg_rag Goto Github PK

Empower Large Language Models (LLM) using Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) for knowledge intensive tasks

License: Apache License 2.0

Python 11.29% Jupyter Notebook 88.71%

gpt knowledge-graph large-language-models llama2 llm rag retrieval-augmented-generation biomedical-applications biomedical-informatics bioinformatics

kg_rag's People

Stargazers

Watchers

Forkers

eltociear namin ayushnoori tonymhl dolife g-mervo kandula-rakesh cclarke411 curiosity007 rkboyce jmaigc winnerking2010 lylyone tmcjp deerleo wenliangz zhaohobby iscas-lee ranga-rangarajan kanimozhiu kp-forks polya20 karthiksoman davy-chendy rajatmandaniyan wuqicyber taosf800911 xingyueren relatable52 chwwhut coderworld520 blizzardwj fanlystone guoqiangjia kerwinjava jarlor 0ax0 harry8207 moqingxinai ccsnow127 janjoy lg8897203 abing22 gsanou mcvillouta winternewt chen-jianxiong gongchuanyang shailesh837 monoboard1 jhdavino mkfischer datacraft-ai agent-uncommon techthiyanes keppy sczhai waywardspooky sallyzhu xuexi567 xuwudawei dwtrott ufda niraj17singh rathodkunj2005 diya-mandot omega-intel zkun-liu gfranxman mrkyaw thanhtung09t2

kg_rag's Issues

Use other knowledge graphs

Thanks for your work. If I now have a simple CSV file of the Knowledge Graph that I want to use in my KG_RAG, how can I embed it in the code concisely and quickly, and what I need to change?

No response from SPOKE api

Hello!
I have been trying to get some response from SPOKE server and getting Error 500. It's been happening on and off. The server seems down. Any idea when it will be resolved?
Best,
Janet

Error in retrieving context

Hello,

I am encountering this context retrieval error while running KG RAG. What could be the possible solution?
Below are 2 examples:

Example 1:
(kg_rag) jjoy@jjoy:~/sulab_projects/KG_RAG$ python -m kg_rag.rag_based_generation.GPT.text_generation -g "gpt-4"
Enter your question : what gene is associated with hypochondrogenesis?
Retrieving context from SPOKE graph...
Traceback (most recent call last):
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 56, in
main()
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 44, in main
context = retrieve_context(question, vectorstore, embedding_function_for_context_retrieval, node_context_df, CONTEXT_VOLUME, QUESTION_VS_CONTEXT_SIMILARITY_PERCENTILE_THRESHOLD, QUESTION_VS_CONTEXT_MINIMUM_SIMILARITY)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/utility.py", line 254, in retrieve_context
node_hits.append(node_search_result[0][0].page_content)
IndexError: list index out of range

Example 2:
(kg_rag) jjoy@jjoy:~/sulab_projects/KG_RAG$ python -m kg_rag.rag_based_generation.GPT.text_generation -i True -g "gpt-4"

Enter your question : Are there any genes that are commonly shared between parkinsons disease and rem sleep disorder?

Press enter for Step 1 - Disease entity extraction using GPT-3.5-Turbo
Processing ...
Extracted entity from the prompt = 'Parkinson's disease, REM sleep disorder'

Press enter for Step 2 - Match extracted Disease entity to SPOKE nodes
Finding vector similarity ...
Traceback (most recent call last):
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jjoy/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 56, in
main()
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 51, in main
interactive(question, vectorstore, node_context_df, embedding_function_for_context_retrieval, CHAT_MODEL_ID)
File "/home/jjoy/sulab_projects/KG_RAG/kg_rag/utility.py", line 313, in interactive
node_hits.append(node_search_result[0][0].page_content)
IndexError: list index out of range

Thanks for your help!

ERROR: Could not build wheels for llama_cpp_python, which is required to install pyproject.toml-based projects

How to solve it？

disease_nodes_db

Hi,

I have a question about the vector_db_path in the config.yaml. What is the diseases_nodes_db here? Thank you

Error in retrieving context for some diseases

Hi @karthiksoman,

I am trying to run the true_false_generation notebook and came across this error where it's not able to retrieve context from SPOKE for some diseases.

for index, row in question_df.iterrows():
    question = row["text"]
    context =  retrieve_context(row["text"], vectorstore, embedding_function_for_context_retrieval, node_context_df, CONTEXT_VOLUME, QUESTION_VS_CONTEXT_SIMILARITY_PERCENTILE_THRESHOLD, QUESTION_VS_CONTEXT_MINIMUM_SIMILARITY)
    # print few context lines
    context_lines = context.split("\n")[:3]
    print(context_lines)

Eg: for question and disease : Neurofibromatosis 2 is not associated with Gene NF2 it is failing and showing the error:

IndexError Traceback (most recent call last)
File ~/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/init.py:382, in Retrying.call(self, fn, *args, **kwargs)
381 try:
--> 382 result = fn(*args, **kwargs)
383 except BaseException: # noqa: B902

File ~/sulab_projects/KG_RAG/kg_rag/utility.py:125, in get_context_using_spoke_api(node_value)
124 context = merge_2['context'].str.cat(sep=' ')
--> 125 context += node_value + " has a " + node_context[0]["data"]["properties"]["source"] + " identifier of " + node_context[0]["data"]["properties"]["identifier"] + " and Provenance of this association is " + node_context[0]["data"]["properties"]["source"] + "."
126 return context

IndexError: list index out of range

The above exception was the direct cause of the following exception:

RetryError Traceback (most recent call last)
Cell In[132], line 3
1 for index, row in question_df.iterrows():
2 question = row["text"]
----> 3 context = retrieve_context(row["text"], vectorstore, embedding_function_for_context_retrieval, node_context_df, CONTEXT_VOLUME, QUESTION_VS_CONTEXT_SIMILARITY_PERCENTILE_THRESHOLD, QUESTION_VS_CONTEXT_MINIMUM_SIMILARITY)
4 # find context first few lines and last few lines
5 context_lines = context.split("\n")[:3]

Cell In[79], line 15
...
--> 326 raise retry_exc from fut.exception()
328 if self.wait:
329 sleep = self.wait(retry_state)

RetryError: RetryError[<Future at 0x7fa2361b66e0 state=finished raised IndexError>]

Issues setting up KGRAG on macOS

Hi,

I found following issues while installing dependencies on macOS. I am using Sonoma 14.5 on M3 mac.

run time

It's been running all night with no results
def fetch_GPT_response(instruction, system_prompt, chat_model_id, chat_deployment_id, temperature=0):
print('Calling OpenAI...')
print("1.4\n")
response = openai.ChatCompletion.create(
temperature=temperature,
deployment_id=chat_deployment_id,
model=chat_model_id,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": instruction}
]
)
print("1.5\n")
if 'choices' in response
and isinstance(response['choices'], list)
and len(response) >= 0
and 'message' in response['choices'][0]
and 'content' in response['choices'][0]['message']:
return response['choices'][0]['message']['content']
else:
return 'Unexpected response'

after print("1.4\n"), the operation is stuck. No result

meta-llama

huggingface website does not have the meta/Llama-2-13b-chat-hf, where can I get it?

errors when installing dependencies

When I run "pip install -r requirements.txt, there is an error like that:

ERROR: Ignored the following versions that require a different python version: 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10
ERROR: Could not find a version that satisfies the requirement triton (from versions: none)
ERROR: No matching distribution found for triton

This is likely due to a mismatch between the Python version and Triton, so I would like to ask if it's possible to run with an older version of Python? Or, could the .txt file be updated?
Thank you very much!

issues with key format

May I know if my format for the API_key is correct?
I use API_KEY=XXXXXX, and put the key directly in /workspaces/KG_RAG/.gpt_config.env.
However, after finishing setup and building the vectorDB, I cannot call OPENAI.
P.S. My key is correct, and I try to generate a new key, but it is not working.

The error is:
(kg_rag) @cswangxiaowei ➜ /workspaces/KG_RAG (main) $ python -m kg_rag.rag_based_generation.GPT.text_generation -g "gpt-4"

Enter your question : who are you?
Retrieving context from SPOKE graph...
Calling OpenAI...
Calling OpenAI...
Calling OpenAI...
Calling OpenAI...
Calling OpenAI...
Traceback (most recent call last):
File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
File "/workspaces/KG_RAG/kg_rag/utility.py", line 183, in fetch_GPT_response
response = openai.ChatCompletion.create(
File "/home/codespace/.local/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 151, in create
) = cls.__prepare_create_request(
File "/home/codespace/.local/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 108, in __prepare_create_request
requestor = api_requestor.APIRequestor(
File "/home/codespace/.local/lib/python3.10/site-packages/openai/api_requestor.py", line 139, in init
self.api_key = key or util.default_api_key()
File "/home/codespace/.local/lib/python3.10/site-packages/openai/util.py", line 186, in default_api_key
raise openai.error.AuthenticationError(
openai.error.AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = ', or you can set the environment variable OPENAI_API_KEY=). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = '. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/envs/kg_rag/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/kg_rag/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspaces/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 56, in
main()
File "/workspaces/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 44, in main
context = retrieve_context(question, vectorstore, embedding_function_for_context_retrieval, node_context_df, CONTEXT_VOLUME, QUESTION_VS_CONTEXT_SIMILARITY_PERCENTILE_THRESHOLD, QUESTION_VS_CONTEXT_MINIMUM_SIMILARITY)
File "/workspaces/KG_RAG/kg_rag/utility.py", line 248, in retrieve_context
entities = disease_entity_extractor_v2(question)
File "/workspaces/KG_RAG/kg_rag/utility.py", line 232, in disease_entity_extractor_v2
resp = get_GPT_response(prompt_updated, system_prompts["DISEASE_ENTITY_EXTRACTION"], chat_model_id, chat_deployment_id, temperature=0)
File "/home/codespace/.local/lib/python3.10/site-packages/joblib/memory.py", line 655, in call
return self._cached_call(args, kwargs)[0]
File "/home/codespace/.local/lib/python3.10/site-packages/joblib/memory.py", line 598, in _cached_call
out, metadata = self.call(*args, **kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/joblib/memory.py", line 856, in call
output = self.func(*args, **kwargs)
File "/workspaces/KG_RAG/kg_rag/utility.py", line 203, in get_GPT_response
return fetch_GPT_response(instruction, system_prompt, chat_model_id, chat_deployment_id, temperature)
File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f5c931e94b0 state=finished raised AuthenticationError>]

Why only the official Llama model from Meta?

I am trying to use
https://github.com/chaoyi-wu/PMC-LLaMA
https://huggingface.co/axiong/PMC_LLaMA_13B
instead of the official Llama, and the repo doesn't let me.

Is there a reason why only the official Llama model from Meta is allowed?

Thanks.

Disease_ With_ Relationship_ To_ genes.pickle file

Thanks for your work. What data does the Disease_ With_ Relationship_ To_ genes.pickle file contain? And how to create this file?

OSError: [Errno 101] Network is unreachable

When I try to run python -m kg_rag.rag_based_generation.GPT.text_generation -g "gpt-4"，I get stuck at step 1 and report the following error：
`Enter your question : Are there any genes that are commonly shared by parkinsons disease and rem sleep disorder?

Press enter for Step 1 - Disease entity extraction using GPT-3.5-Turbo
Processing ...
Traceback (most recent call last):
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connectionpool.py", line 790, in urlopen
response = self._make_request(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1092, in _validate_conn
conn.connect()
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connection.py", line 611, in connect
self.sock = sock = self._new_conn()
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connection.py", line 218, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fc6a4a419c0>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connectionpool.py", line 874, in urlopen
return self.urlopen(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connectionpool.py", line 874, in urlopen
return self.urlopen(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/connectionpool.py", line 844, in urlopen
retries = retries.increment(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc6a4a419c0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_requestor.py", line 606, in request_raw
result = _thread_context.session.request(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc6a4a419c0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
File "/storeDisk2/lsp/KG_RAG/kg_rag/utility.py", line 183, in fetch_GPT_response
response = openai.ChatCompletion.create(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 155, in create
response, _, api_key = requestor.request(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_requestor.py", line 289, in request
result = self.request_raw(
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in request_raw
raise error.APIConnectionError(
openai.error.APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc6a4a419c0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/storeDisk2/lsp/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 56, in
main()
File "/storeDisk2/lsp/KG_RAG/kg_rag/rag_based_generation/GPT/text_generation.py", line 51, in main
interactive(question, vectorstore, node_context_df, embedding_function_for_context_retrieval, CHAT_MODEL_ID)
File "/storeDisk2/lsp/KG_RAG/kg_rag/utility.py", line 303, in interactive
entities = disease_entity_extractor_v2(question)
File "/storeDisk2/lsp/KG_RAG/kg_rag/utility.py", line 232, in disease_entity_extractor_v2
resp = get_GPT_response(prompt_updated, system_prompts["DISEASE_ENTITY_EXTRACTION"], chat_model_id, chat_deployment_id, temperature=0)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/joblib/memory.py", line 655, in call
return self._cached_call(args, kwargs)[0]
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/joblib/memory.py", line 598, in _cached_call
out, metadata = self.call(*args, **kwargs)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/joblib/memory.py", line 856, in call
output = self.func(*args, **kwargs)
File "/storeDisk2/lsp/KG_RAG/kg_rag/utility.py", line 203, in get_GPT_response
return fetch_GPT_response(instruction, system_prompt, chat_model_id, chat_deployment_id, temperature)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "/storeDisk2/lsp/miniconda3/envs/kg_rag/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7fc6a55d7670 state=finished raised APIConnectionError>]`

vectorstore

Thank you for your nice work!

baranzinilab / kg_rag Goto Github PK

kg_rag's People

Stargazers

Watchers

Forkers

kg_rag's Issues

When I run "pip install -r requirements.txt, there is an error like that:

Recommend Projects

Recommend Topics

Recommend Org