mem0ai / mem0 Goto Github PK
View Code? Open in Web Editor NEWThe memory layer for Personalized AI
Home Page: https://mem0.ai
License: Apache License 2.0
The memory layer for Personalized AI
Home Page: https://mem0.ai
License: Apache License 2.0
some web pages are need login or paid,need curl to get content.i can create a PR
Trying to install using pip3 and it returns this error:
Building wheels for collected packages: hnswlib
Building wheel for hnswlib (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for hnswlib (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [199 lines of output]
running bdist_wheel
running build
running build_ext
creating var
creating var/folders
creating var/folders/8c
creating var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn
creating var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T
x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c /var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmp4e6jgsj0.cpp -o var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmp4e6jgsj0.o -std=c++14
x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c /var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmpsl27hkck.cpp -o var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmpsl27hkck.o -fvisibility=hidden
building 'hnswlib' extension
creating build
creating build/temp.macosx-13-arm64-cpython-311
creating build/temp.macosx-13-arm64-cpython-311/python_bindings
x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include -I/opt/homebrew/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/numpy/core/include -I./hnswlib/ -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c ./python_bindings/bindings.cpp -o build/temp.macosx-13-arm64-cpython-311/./python_bindings/bindings.o -O3 -stdlib=libc++ -mmacosx-version-min=10.7 -DVERSION_INFO=\"0.7.0\" -std=c++14 -fvisibility=hidden
In file included from ./python_bindings/bindings.cpp:6:
In file included from ./hnswlib/hnswlib.h:199:
./hnswlib/hnswalg.h:755:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
for (int i = 0; i < dim; i++) {
~ ^ ~~~
./python_bindings/bindings.cpp:102:13: warning: format specifies type 'int' but the argument has type 'pybind11::ssize_t' (aka 'long') [-Wformat]
buffer.ndim);
^~~~~~~~~~~
./python_bindings/bindings.cpp:126:17: warning: format specifies type 'int' but the argument has type 'pybind11::ssize_t' (aka 'long') [-Wformat]
ids_numpy.ndim, feature_rows);
^~~~~~~~~~~~~~
./python_bindings/bindings.cpp:126:33: warning: format specifies type 'int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
ids_numpy.ndim, feature_rows);
^~~~~~~~~~~~
./python_bindings/bindings.cpp:121:58: warning: comparison of integers of different signs: 'std::__vector_base<long, std::allocator<long>>::value_type' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
if (!((ids_numpy.ndim == 1 && ids_numpy.shape[0] == feature_rows) ||
~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
./python_bindings/bindings.cpp:383:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:386:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:389:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:392:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:395:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:647:28: warning: unused variable 'data' [-Wunused-variable]
float* data = (float*)items.data(row);
^
./python_bindings/bindings.cpp:667:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:670:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:853:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:856:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:876:1: warning: 'pybind11_init' is deprecated: PYBIND11_PLUGIN is deprecated, use PYBIND11_MODULE [-Wdeprecated-declarations]
PYBIND11_PLUGIN(hnswlib) {
^
/private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:432:20: note: expanded from macro 'PYBIND11_PLUGIN'
return pybind11_init(); \
^
./python_bindings/bindings.cpp:876:1: note: 'pybind11_init' has been explicitly marked deprecated here
/private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:426:5: note: expanded from macro 'PYBIND11_PLUGIN'
PYBIND11_DEPRECATED("PYBIND11_PLUGIN is deprecated, use PYBIND11_MODULE") \
^
/private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:194:43: note: expanded from macro 'PYBIND11_DEPRECATED'
# define PYBIND11_DEPRECATED(reason) [[deprecated(reason)]]
^
In file included from ./python_bindings/bindings.cpp:6:
In file included from ./hnswlib/hnswlib.h:199:
./hnswlib/hnswalg.h:95:11: warning: field 'link_list_locks_' will be initialized after field 'label_op_locks_' [-Wreorder-ctor]
: link_list_locks_(max_elements),
^
./python_bindings/bindings.cpp:488:39: note: in instantiation of member function 'hnswlib::HierarchicalNSW<float>::HierarchicalNSW' requested here
new_index->appr_alg = new hnswlib::HierarchicalNSW<dist_t>(
^
./python_bindings/bindings.cpp:880:38: note: in instantiation of member function 'Index<float>::createFromParams' requested here
.def(py::init(&Index<float>::createFromParams), py::arg("params"))
^
./python_bindings/bindings.cpp:667:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:892:28: note: in instantiation of member function 'Index<float>::knnQuery_return_numpy' requested here
&Index<float>::knnQuery_return_numpy,
^
./python_bindings/bindings.cpp:670:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:619:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
if (rows <= num_threads * 4) {
~~~~ ^ ~~~~~~~~~~~~~~~
./python_bindings/bindings.cpp:257:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
if (features != dim)
~~~~~~~~ ^ ~~~
./python_bindings/bindings.cpp:898:28: note: in instantiation of member function 'Index<float>::addItems' requested here
&Index<float>::addItems,
^
./python_bindings/bindings.cpp:261:18: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
if (rows <= num_threads * 4) {
~~~~ ^ ~~~~~~~~~~~~~~~
In file included from ./python_bindings/bindings.cpp:6:
In file included from ./hnswlib/hnswlib.h:199:
./hnswlib/hnswalg.h:755:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
for (int i = 0; i < dim; i++) {
~ ^ ~~~
./python_bindings/bindings.cpp:323:47: note: in instantiation of function template specialization 'hnswlib::HierarchicalNSW<float>::getDataByLabel<float>' requested here
data.push_back(appr_alg->template getDataByLabel<data_t>(id));
^
./python_bindings/bindings.cpp:903:49: note: in instantiation of member function 'Index<float>::getDataReturnList' requested here
.def("get_items", &Index<float, float>::getDataReturnList, py::arg("ids") = py::none())
^
./python_bindings/bindings.cpp:383:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:467:27: note: in instantiation of member function 'Index<float>::getAnnData' requested here
auto ann_params = getAnnData();
^
./python_bindings/bindings.cpp:945:43: note: in instantiation of member function 'Index<float>::getIndexParams' requested here
return py::make_tuple(ind.getIndexParams()); /* Return dict (wrapped in a tuple) that fully encodes state of the Index object */
^
./python_bindings/bindings.cpp:386:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:389:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:392:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:395:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
In file included from ./python_bindings/bindings.cpp:6:
In file included from ./hnswlib/hnswlib.h:198:
./hnswlib/bruteforce.h:105:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
for (int i = 0; i < k; i++) {
~ ^ ~
./hnswlib/bruteforce.h:59:5: note: in instantiation of member function 'hnswlib::BruteforceSearch<float>::searchKnn' requested here
~BruteforceSearch() {
^
./python_bindings/bindings.cpp:748:13: note: in instantiation of member function 'hnswlib::BruteforceSearch<float>::~BruteforceSearch' requested here
delete alg;
^
/Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1397:5: note: in instantiation of member function 'BFIndex<float>::~BFIndex' requested here
delete __ptr;
^
/Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1658:7: note: in instantiation of member function 'std::default_delete<BFIndex<float>>::operator()' requested here
__ptr_.second()(__tmp);
^
/Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1612:19: note: in instantiation of member function 'std::unique_ptr<BFIndex<float>>::reset' requested here
~unique_ptr() { reset(); }
^
/private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/pybind11.h:1872:40: note: in instantiation of member function 'std::unique_ptr<BFIndex<float>>::~unique_ptr' requested here
v_h.holder<holder_type>().~holder_type();
^
/private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/pybind11.h:1535:26: note: in instantiation of member function 'pybind11::class_<BFIndex<float>>::dealloc' requested here
record.dealloc = dealloc;
^
./python_bindings/bindings.cpp:957:9: note: in instantiation of function template specialization 'pybind11::class_<BFIndex<float>>::class_<>' requested here
py::class_<BFIndex<float>>(m, "BFIndex")
^
In file included from ./python_bindings/bindings.cpp:6:
In file included from ./hnswlib/hnswlib.h:198:
./hnswlib/bruteforce.h:113:27: warning: comparison of integers of different signs: 'int' and 'const size_t' (aka 'const unsigned long') [-Wsign-compare]
for (int i = k; i < cur_element_count; i++) {
~ ^ ~~~~~~~~~~~~~~~~~
./python_bindings/bindings.cpp:853:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:960:44: note: in instantiation of member function 'BFIndex<float>::knnQuery_return_numpy' requested here
.def("knn_query", &BFIndex<float>::knnQuery_return_numpy, py::arg("data"), py::arg("k") = 1, py::arg("filter") = py::none())
^
./python_bindings/bindings.cpp:856:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
delete[] f;
^ ~
./python_bindings/bindings.cpp:778:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
if (features != dim)
~~~~~~~~ ^ ~~~
./python_bindings/bindings.cpp:961:44: note: in instantiation of member function 'BFIndex<float>::addItems' requested here
.def("add_items", &BFIndex<float>::addItems, py::arg("data"), py::arg("ids") = py::none())
^
In file included from ./python_bindings/bindings.cpp:6:
./hnswlib/hnswlib.h:80:13: warning: unused function 'AVX512Capable' [-Wunused-function]
static bool AVX512Capable() {
^
34 warnings generated.
creating build/lib.macosx-13-arm64-cpython-311
x86_64-apple-darwin13.4.0-clang++ -bundle -undefined dynamic_lookup -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/acf/opt/anaconda3/lib -L/Users/acf/opt/anaconda3/lib -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include build/temp.macosx-13-arm64-cpython-311/./python_bindings/bindings.o -o build/lib.macosx-13-arm64-cpython-311/hnswlib.cpython-311-darwin.so -stdlib=libc++ -mmacosx-version-min=10.7
ld: warning: -pie being ignored. It is only used when linking a main executable
ld: unsupported tapi file type '!tapi-tbd' in YAML file '/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/lib/libSystem.tbd' for architecture x86_64
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
error: command '/Users/acf/opt/anaconda3/bin/x86_64-apple-darwin13.4.0-clang++' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for hnswlib
Failed to build hnswlib
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects
Embedchain will parse uploaded images, extract text information and embed.
Ex, Screenshot of a book chapter.
The parser package should be configurable, the default should be opensource.
specifically im working with snowflake but would love to be able to select a table, or set of tables as a format source from my data warehouse
Hi
Is there any way to configure temperature and model usage for openai when run a query?
Thanks.
I would appreciate it if you add Huggingface embeddings, because it would be free to use, in contrast to OpenAI's embeddings, which uses ada I believe. So something along those lines would be great:
`embeddings_model_name = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)`
Altough I must admit that I do not know the difference between openAI and this model when it comes to embeddings, if anyone knows, please let me know what those differences are.
max_tokens parameter being set to 1000 is an issue. With having multiple sources (with long urls) and larger webpages, this is quickly eaten up. When the token amount is exceeded no warning is given except from openAI.
openai.error.RateLimitError: The server had an error while processing your request. Sorry about that!
def get_openai_answer(self, prompt):
messages = []
messages.append({
"role": "user", "content": prompt
})
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
messages=messages,
temperature=0,
max_tokens=1000,
top_p=1,
)
return response["choices"][0]["message"]["content"]
I encountered a strange problem: my Python code consists of only one file, and when the name of this Python file is the same as the name of the library it references: embedchain.py, an error is reported. ImportError: cannot import name 'App' from partially initialized module 'xxx' (most likely due to a circular import)
So, Just rename the file to anther name, and it will be fixed.
Is it possible to ask the bot articulate questions such as "Given the various documents, can you write the history of..."
Setup Following Project Management Tools
pytests
and pylint
setupreadthedocs
serverDocstrings
for API : Google Style is recommended.I can help with above
Currently, embedchain
allows the addition of various types of data sources such as YouTube videos, PDF files, and web pages to be processed and used in the application. This feature request proposes to extend this functionality to include DataFrames, specifically those from the Spark or Pandas libraries, as potential data sources.
DataFrames are a commonly used data structure for handling and manipulating data in Python, especially in data science and machine learning applications. They are particularly effective when dealing with large, structured datasets, which can include text data.
The ability to use DataFrames as a source of data would add a significant amount of flexibility to embedchain
, as users could directly input their preprocessed and transformed data into the application. This could be beneficial in scenarios where the data is already available in a DataFrame format, such as when it has been preprocessed or transformed as part of a larger data pipeline.
The implementation of this feature would involve adding a new method to the App
class (or modifying the existing .add()
method) that accepts a DataFrame and its format (Spark or Pandas) as arguments. The method would then handle the loading of the data from the DataFrame into the application in the appropriate format, ready to be processed and used in the application.
This feature would increase the flexibility and usefulness of embedchain
, making it more applicable to a wider range of scenarios and use-cases, and potentially attracting a broader user base. It would also align well with common data science workflows, which often involve the use of DataFrames for data manipulation and analysis.
Please consider adding this feature in a future update of embedchain
.
Hi!
Can we use custom LLM and Embedding models with it?
Thanx
Hi, is there are JS version of EmbedChain (similar to what was done with LangChainJS) in the works?
Thanks for building this!
David
Parameters to specify OpenAI model and settings.
ex. I'm subclassing App and updating the model this way to test:
def get_openai_answer(self, prompt):
messages = []
messages.append({
"role": "user", "content": prompt
})
response = openai.ChatCompletion.create(
model="gpt-4-0613",
messages=messages,
temperature=0.25,
max_tokens=1000,
top_p=1,
)
return response["choices"][0]["message"]["content"]
It would be awesome to have a few parameters when querying for temperature,max_tokens, and top_p as well. Or globally/in env? not sure what's best, but happy to create a PR.
Hi @taranjeet I was working on my mini project to chat over a small-sized blog and I found myself writing some piece of code to iterate over the sitemap of the website. I think it would be valueable if we can provide format support for a sitemap to automate multiple web page loading and chunking. Do you already have a issue tracking that or it is something that can be added
Right now I am doing something like this:
# Download sitemap.xml file from a website and extract all the links
def get_links(url):
url = f'{url}/sitemap.xml'
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'lxml')
links = [link.text for link in soup.find_all('loc')]
return links
else:
print(f'Error: {response.status_code}')
return None
How do I train the model with my local files? Suppose I have a pdf in root directory and I want to add it like mygpt.add("pdf_file", "book.pdf"). Is it possible?
Let's talk about this method:
def query(self, input_query):
"""
Queries the vector database based on the given input query.
Gets relevant doc based on the query and then passes it to an
LLM as context to get the answer.
:param input_query: The query to use.
:return: The answer to the query.
"""
result = self.collection.query(
query_texts=[input_query,],
n_results=1,
)
result_formatted = self._format_result(result)
answer = self.get_answer_from_llm(input_query, result_formatted[0][0].page_content)
return answer
As far as I can tell, (and I'm just reading, not necessarily understanding, correct me if I'm wrong), it will return the one single closest document. n_results=1
What if we have a more granular database, cut into smaller pieces?
E.g. the webpages and documents we added are only a paragraph long. Then it will only return that one paragraph. So let's keep imagining that a user asks a complex question for which the correct answer is stored in more than one document. Then it would only answer part of the question with limited knowledge.
Here's a simple example. Let's say we are in the car business and feed our database information about the Corvette, one page for each generation. Then a user asks how much does horsepower does the current Corvette make and how much did the first one make?
. If my understanding is correct, it could not answer that question (for this specific question, ChatGPT knows the answer out of the box, but you get the point).
For these kinds of use cases I'm proposing to allow the retrieval of more than one document, configurable by the user. 1 can stay as the default. These are then all passed as context so a LLM can do it's magic and process the information.
The downside I can see is that it will require more tokens, and thus cost more. This is a compromise the user has to make for better results. The max token limit should also be considered, especially in cases where the database contains short and long text, for this edge case, max tokens should be configurable by the user, and in case a limit is set, the tokens of the prompt should be counted and cut off if necessary. edit: openai has a max tokens parameter that does all of this
P.S. Why are we prompting with prompt = f"""Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. {context}
if we just use one piece of context.
I will propose a PR for his.
This issue is meant to track PR from https://github.com/cachho.
Collating thoughts and final action here first
Currently, embedchain is designed to use OpenAI's API for creating embeddings and leveraging the power of GPT-3 for generating answers in the context of chatbots. This feature request proposes to include the option of using Azure's OpenAI API as an alternative.
Azure, a comprehensive suite of cloud services offered by Microsoft, also provides an implementation of OpenAI API. Integration with Azure's OpenAI API would provide a choice to the users to select between OpenAI's original API and Azure's version based on their specific requirements and preferences.
can you add this format please
This is my code:
import os
os.environ["OPENAI_API_KEY"] = "sk-???"
from embedchain import App
naval_chat_bot = App()
naval_chat_bot.add_local("pdf_file", "docs/masnavi-en.pdf")
print(naval_chat_bot.query("Who is the most powerful man?"))
I get chromadb.errors.DuplicateIDError: Expected IDs to be unique, found duplicates for
. Where is the problem?
P.S: This was my second attempt. The first one with a different pdf document was successful.
When trying to run the sample code I get this:
ImportError: cannot import name 'TypeVar' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)
I am running this in a Databricks notebook.
First off... Great job!!! Simple and tight code. Much appreciate you making/sharing it.
There was one quick suggestion I had: In order to minimize boilerplate code, it would be good to modify the interface to make the file_type
variable optional and detected based on the input content. If the variable is defined then the code would check the file to ensure that it is of the specified type.
This ease-of-life modification should be added early in development to minimize more extensive refactors down the line.
But I wholly understand if you have a different design goal for making this a required input.
Hi @taranjeet,
Facing issues with rate-limiting and context window limitations.
Would recommend wrapping the openai base call with reliableGPT
.
from reliablegpt import reliableGPT
openai.ChatCompletion.create = reliableGPT(openai.ChatCompletion.create, ...)
wrap a @sveltejs interface around it and make an ai product.
opened on behalf of Twitter user Patrick, tweet link
please allow epub format for one of the types supported
Process a database as a data source
The Embedchain class has a lot of methods and it would add value in terms of code readability to abstract it a little bit. There are many open issues about integrating multiple llms, vector dbs or embedding. While I see a level of abstraction in the vector db folder and that can be leveraged for further integration options, I believe we should do something similar for the methods where we use the embedding models and the llm model. I have raised a PR for this #92 which attends to abstracting the data formats for loaders and chunkers . @taranjeet @cachho please let me know if this is something we can add, so we can have some further discussions on how to structure it for the more critical pieces like the embedding models and chat completions.
Hi there, I see that the framework is using GPT-3.5 last release , for doing prompting
How I can change to GPT-4 ?
My Best for this project !
Regards
I hope for support for a custom base_url. This is because the original base_url from OpenAI is sometimes not directly accessible from my region.
how does the framework handle caching - does it embed everything again and add to database each time you run the script or does it know that a given data source is already embedded and in database therefore no need to incur that expense?
Note: This issue is opened on behalf of discord user bodech, message link
import os
from keys import *
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
from embedchain import App
naval_chat_bot = App()
naval_chat_bot.add("web_page", "https://psymplicity.com/")
print(naval_chat_bot.query("what is the three-step approach to private mental health care"))
Unable to connect optimized C data functions [No module named '_testbuffer'], falling back to pure Python
All data from https://psymplicity.com/ already exists in the database.
Traceback (most recent call last):
File "c:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\code\flask_app_2\embedchain_test.py", line 21, in
print(naval_chat_bot.query("what is the three-step approach to private mental health care"))
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 225, in query
answer = self.get_answer_from_llm(prompt)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 211, in get_answer_from_llm
answer = self.get_openai_answer(prompt)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 162, in get_openai_answer
response = openai.ChatCompletion.create(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 743, in _interpret_response_line
raise error.ServiceUnavailableError(
openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.