gkamradt / langchain-tutorials Goto Github PK
View Code? Open in Web Editor NEWOverview and tutorial of the LangChain Library
Overview and tutorial of the LangChain Library
Hey there
Thanks for putting this together. I had the same conclusion regarding the summarisation of a large document, in terms of splitting, then embedding, and then ranking the sections and choosing the most relevant for a map_reduce.
However, I've been scouring the net and racking my brains to find a splitter that would work according to theme (eg. keyword density) or being able to identify chapter/section breaks without having to pre-define what the markup would look like.
Is there a python tool or form of analysis that can segment a text document into smaller part more intelligently than a character length breakpoint?
Thanks :)
Hello! I am receiving this NameError after this line: chain.run(input_documents=docs, question=query)
NameError Traceback (most recent call last)
Cell In[32], line 1
----> 1 chain.run(input_documents=docs, question=query)
NameError: name 'chain' is not defined
In GPT4 I get this answer:
The NameError indicates that the interpreter is unable to find a defined variable or function named chain. This error occurs when the name is not defined in the current scope, or there is a typo in the name.
To fix the error, you need to ensure that the variable chain is defined in the current scope. Check to see if you have defined chain earlier in the code or in a different module that you may have forgotten to import.
In some cases, running the cells from the beginning after selecting "Restart & Clear Output" may resolve the issue [1]. It may also be helpful to review the documentation for the package or module you are using to ensure that you are using it correctly. Checking for any typos in the variable name may also help resolve the issue.
Overall, the NameError indicates that the interpreter is unable to find a defined variable or function. To fix the error, ensure that the name is defined in the current scope or imported from another module, and check for any typos in the name.
when i run the notebook :https://github.com/gkamradt/langchain-tutorials/blob/main/getting_started/Quickstart%20Guide.ipynb
`
#!pip install google-search-results
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI
llm = OpenAI(temperature=0)
os.environ["SERPAPI_API_KEY"] = ""
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
agent.run("Who is the current leader of Japan? What is the largest prime number that is smaller than their age?")
`
`
Entering new AgentExecutor chain...
I need to find out who the leader of Japan is and then calculate the largest prime number that is smaller than their age.
Action: Search
Action Input: "current leader of Japan"
Observation: Fumio Kishida
Thought: I need to find out the age of the leader of Japan
Action: Search
Action Input: "age of Fumio Kishida"
Observation: 65 years
Thought: I need to calculate the largest prime number that is smaller than 65
Action: Calculator
Action Input: 65
ValueError Traceback (most recent call last)
Cell In[18], line 2
1 # Now let's test it out!
----> 2 agent.run("Who is the current leader of Japan? What is the largest prime number that is smaller than their age?")
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:213, in Chain.run(self, *args, **kwargs)
211 if len(args) != 1:
212 raise ValueError("run
supports only one positional argument.")
--> 213 return self(args[0])[self.output_keys[0]]
215 if kwargs and not args:
216 return self(kwargs)[self.output_keys[0]]
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:116, in Chain.call(self, inputs, return_only_outputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116 raise e
117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
118 return self.prep_outputs(inputs, outputs, return_only_outputs)
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:113, in Chain.call(self, inputs, return_only_outputs)
107 self.callback_manager.on_chain_start(
108 {"name": self.class.name},
109 inputs,
110 verbose=self.verbose,
111 )
112 try:
--> 113 outputs = self._call(inputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/agent.py:792, in AgentExecutor._call(self, inputs)
790 # We now enter the agent loop (until it returns something).
791 while self._should_continue(iterations, time_elapsed):
--> 792 next_step_output = self._take_next_step(
793 name_to_tool_map, color_mapping, inputs, intermediate_steps
794 )
795 if isinstance(next_step_output, AgentFinish):
796 return self._return(next_step_output, intermediate_steps)
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/agent.py:695, in AgentExecutor._take_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps)
693 tool_run_kwargs["llm_prefix"] = ""
694 # We then call the tool on the tool input to get an observation
--> 695 observation = tool.run(
696 agent_action.tool_input,
697 verbose=self.verbose,
698 color=color,
699 **tool_run_kwargs,
700 )
701 else:
702 tool_run_kwargs = self.agent.tool_run_logging_kwargs()
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/tools/base.py:107, in BaseTool.run(self, tool_input, verbose, start_color, color, **kwargs)
105 except (Exception, KeyboardInterrupt) as e:
106 self.callback_manager.on_tool_error(e, verbose=verbose_)
--> 107 raise e
108 self.callback_manager.on_tool_end(
109 observation, verbose=verbose_, color=color, name=self.name, **kwargs
110 )
111 return observation
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/tools/base.py:104, in BaseTool.run(self, tool_input, verbose, start_color, color, **kwargs)
102 try:
103 tool_args, tool_kwargs = _to_args_and_kwargs(tool_input)
--> 104 observation = self.run(*tool_args, **tool_kwargs)
105 except (Exception, KeyboardInterrupt) as e:
106 self.callback_manager.on_tool_error(e, verbose=verbose)
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/agents/tools.py:31, in Tool._run(self, *args, **kwargs)
29 def _run(self, *args: Any, **kwargs: Any) -> str:
30 """Use the tool."""
---> 31 return self.func(*args, **kwargs)
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:213, in Chain.run(self, *args, **kwargs)
211 if len(args) != 1:
212 raise ValueError("run
supports only one positional argument.")
--> 213 return self(args[0])[self.output_keys[0]]
215 if kwargs and not args:
216 return self(kwargs)[self.output_keys[0]]
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:116, in Chain.call(self, inputs, return_only_outputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116 raise e
117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
118 return self.prep_outputs(inputs, outputs, return_only_outputs)
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/base.py:113, in Chain.call(self, inputs, return_only_outputs)
107 self.callback_manager.on_chain_start(
108 {"name": self.class.name},
109 inputs,
110 verbose=self.verbose,
111 )
112 try:
--> 113 outputs = self._call(inputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/llm_math/base.py:130, in LLMMathChain._call(self, inputs)
126 self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
127 llm_output = llm_executor.predict(
128 question=inputs[self.input_key], stop=["```output"]
129 )
--> 130 return self._process_llm_result(llm_output)
File /opt/miniconda3/envs/py38_langchain/lib/python3.8/site-packages/langchain/chains/llm_math/base.py:86, in LLMMathChain._process_llm_result(self, llm_output)
84 answer = "Answer: " + llm_output.split("Answer:")[-1]
85 else:
---> 86 raise ValueError(f"unknown format from LLM: {llm_output}")
87 return {self.output_key: answer}
ValueError: unknown format from LLM: This is not a math problem and cannot be translated into an expression that can be executed using Python's numexpr library.`
They just closed pinecone's free system to now a waiting list ( and it is $70/month for the next level) - can some explain how to get this working with an alternative system to pinecone.
Would be nice to lock the versions in the Requirement file version, i.e. via Pip freeze because the long chain is changing so much and breaking sometime.
chain = create_extraction_chain(llm, schema, encoder_or_encoder_class='json')
output = chain.predict_and_parse(text="please add 15 more units sold to 2023")['data']
printOutput(output)
Running this code block throws a TypeError : initial_value must be str or None, not dict.
I was able to get output in json format using:
chain.predict(text=text)["data"]
Suppose i have multiple chunks and I want to build an application where I ask questions that require it to fetch across multiple chunks. For example, I have detailed experience reports of a trek from 100 people and i want to query how many of them went prepared with a first aid kit and how many of them needed to use it. What type of chunking and retrieval is the most appropriate for it?
Thanks for the cookbook. Pretty insightful.
In the section for VectorStores (under Indexes), the embeddings of the text are created using
embeddings.embed_documents()
but the vectorstore (FAISS) class is imported but not used as:
db = FAISS.from_documents(texts, embeddings)
.
Maybe the section should include creation of the vectorstore and its usage
I know we can print the documents that match the question via Pinecone, but it would be great to be able to print a citation or source that was used to determine the final answer if this can be added as a feature? Great work btw.
Sometimes I find the final summary is too short, how can make it longer?
Thanks!
It would be helpful to explicitly add a !pip install langchain openai
cell to the top of the LangChain Cookbooks. Otherwise, users have to play whack-a-mole with installing some packages as they work down the notebook.
Any idea how to solve this?
I want to use RetrievalQA chain to achieve a QA about PDF,how I don't know why my response always be split although
it‘s total tokens just 1800 or 2500 such as
so I want to use a map_reduce,but it give me a error
and my code is
`
top_matches = vector_db.similarity_search_with_score(query=question, k=int(top_k))
top_matches_contexts = " "
print("top_matches",top_matches)
print("top_k",top_k)
for i, val in enumerate(top_matches):
top_matches_contexts += "{}、{}\n".format(i+1, val[0].page_content)
if top_matches_contexts == []:
return "请详细描述你的问题"
top_matches_contexts = remove_spaces_and_newlines(top_matches_contexts)
global language
query = prompt.format(query=question, reference=top_matches_contexts,key_word=key_word,language=language)
# chat_history,top_matches_contexts = deal_total_token(chat_history,query)
qa_chain = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0.0,openai_api_key="XXXXX"), chain_type="map_reduce",
retriever=vector_db.as_retriever())
result = qa_chain({"query": query,"chat_history": chat_history})
count_tokens(qa_chain,query)
`
Hi. I have a quick question. Do you have any example with the SimpleMongo DB loader? have you tested this connector?
I am finding difficulties to connect on Mongo db with username and password as per documentation is using only host and port. And also can i use the html reader to read a website and using llang agent to store content locally?
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4114 tokens (3858 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
Not sure how to reduce the max_tokens, or prompt size.
Hi, I am goint through your email tutorial and one I like it a lot however one thing remains unclear and it is left without any comment. Perhpas it would be good to clarify?
Input variables are 'input_documents', 'company' etc.. but map template uses 'text' as well as reduce template uses 'text'
Which text is it? is it the same value? I guess not, but it is not mentioned and actually this is the only part of your code that is leaving me with some questions.
Is it recognized by position?
Running the exact same code as "Ask A Book Questions.ipynb", I ran into the following error:
Diagnosis (they all return the above error):
What is the problem?
in Clean and Standardize Data section. If I have some mapping data already. And the amount exceed token limit so i can not use prompt example. Is there any method to let LLM know these datas and use them?
Anyone know how I fix this error?
ValidationError: 1 validation error for FewShotPromptTemplate
example_selector
instance of BaseExampleSelector expected (type=type_error.arbitrary_type; expected_arbitrary_type=BaseExampleSelector)
Hi there, thanks for solving my issue about loading PDF. I came across another issue and suspect it may relate to some python packages version.
I am trying Ask A Book Questions
tutorial and get below error when executing this line: docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
Traceback (most recent call last):
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
conn.connect()
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connection.py", line 362, in connect
self.sock = ssl_wrap_socket(
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/util/ssl_.py", line 386, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/urllib3/util/retry.py", line 446, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by SSLError(SSLError(1, '[SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/serena/Documents/langchain-tutorials/data_generation/chatPDF.py", line 33, in <module>
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/vectorstores/pinecone.py", line 235, in from_texts
embeds = embedding.embed_documents(lines_batch)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/embeddings/openai.py", line 269, in embed_documents
return self._get_len_safe_embeddings(texts, engine=self.deployment)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/embeddings/openai.py", line 188, in _get_len_safe_embeddings
encoding = tiktoken.model.encoding_for_model(self.model)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/model.py", line 75, in encoding_for_model
return get_encoding(encoding_name)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/registry.py", line 63, in get_encoding
enc = Encoding(**constructor())
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken_ext/openai_public.py", line 64, in cl100k_base
mergeable_ranks = load_tiktoken_bpe(
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 114, in load_tiktoken_bpe
contents = read_file_cached(tiktoken_bpe_file)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 46, in read_file_cached
contents = read_file(blobpath)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/tiktoken/load.py", line 24, in read_file
return requests.get(blobpath).content
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/requests/adapters.py", line 563, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by SSLError(SSLError(1, '[SSL: UNEXPECTED_RECORD] unexpected record (_ssl.c:1129)')))
Appreciate your help in advance!
Code:
loader_book = PyPDFLoader("D:/PaperPal/langchain-tutorials/data/The Attention Merchants_ The Epic Scramble to Get Inside Our Heads ( PDFDrive ) (1).pdf")
test = loader_book.load()
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(test[0])
I get the following error even when the test[0] is a Document object
> Entering new MapReduceDocumentsChain chain...
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?6f60f6d3-3206-4586-b2b2-d8a0f86e1aa0)---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[d:\PaperPal\langchain-tutorials\chains\Chain](file:///D:/PaperPal/langchain-tutorials/chains/Chain) Types.ipynb Cell 19 in ()
----> [1](vscode-notebook-cell:/d%3A/PaperPal/langchain-tutorials/chains/Chain%20Types.ipynb#X16sZmlsZQ%3D%3D?line=0) chain.run(test[0])
File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:213](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:213), in Chain.run(self, *args, **kwargs)
211 if len(args) != 1:
212 raise ValueError("`run` supports only one positional argument.")
--> 213 return self(args[0])[self.output_keys[0]]
215 if kwargs and not args:
216 return self(kwargs)[self.output_keys[0]]
File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:116](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:116), in Chain.__call__(self, inputs, return_only_outputs)
114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 116 raise e
117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
118 return self.prep_outputs(inputs, outputs, return_only_outputs)
File [c:\Users\mail2\anaconda3\lib\site-packages\langchain\chains\base.py:113](file:///C:/Users/mail2/anaconda3/lib/site-packages/langchain/chains/base.py:113), in Chain.__call__(self, inputs, return_only_outputs)
107 self.callback_manager.on_chain_start(
108 {"name": self.__class__.__name__},
109 inputs,
110 verbose=self.verbose,
111 )
...
--> 141 [{**{self.document_variable_name: d.page_content}, **kwargs} for d in docs]
142 )
143 return self._process_results(results, docs, token_max, **kwargs)
AttributeError: 'tuple' object has no attribute 'page_content'
MaxRetryError: HTTPSConnectionPool(host='controller.pinecone_api_env.pinecone.io', port=443): Max retries exceeded with url: /databases (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe2b3ed8250>: Failed to establish a new connection: [Errno -2] Name or service not known'))
I got this error. The version of langchain is 0.0.169. I want to know how to fix this error.
ValidationError Traceback (most recent call last)
Cell In[2], line 3
1 # db = SQLDatabase.from_uri("sqlite:///../../../../notebooks/Chinook.db")
2 db = SQLDatabase.from_uri("sqlite:///../../notebooks/Chinook.db")
----> 3 toolkit = SQLDatabaseToolkit(db=db)
4 agent_executor = create_sql_agent(
5 llm = OpenAI(temperature=0, openai_api_key="sk-xxx"),
6 toolkit=toolkit,
7 verbose=True
8 )
File ~/miniconda3/envs/aigc/lib/python3.10/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.init()
ValidationError: 1 validation error for SQLDatabaseToolkit
llm
field required (type=value_error.missing)
The notebook is out of sync with the current version of Pinecone. Here are some thoughts:
from langchain.vectorstores import Chroma, Pinecone
: I think it's better to install langchain_pinecone
and use from langchain_pinecone import PineconeVectorStore
: https://python.langchain.com/docs/integrations/vectorstores/pineconepip install pinecone-client
, you'll need to change import pinecone
to from pinecone import Pinecone
. When creating an index, you'll also need to import the ServerlessSpec
or PodSpec
class.environment
. Instead of:pinecone.init(
api_key=PINECONE_API_KEY, # find at app.pinecone.io
environment=PINECONE_API_ENV # next to api key in console
)
you use:
pc = Pinecone(api_key=PINECONE_API_KEY)
For summarization methods above level 3, the best practice is not to use RecursiveCharacterTextSplitter
, but TokenTextSplitter
, because the number of tokens corresponding to the same length of string intercepted varies greatly from language to language.
text_splitter_by_char = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500)
text_splitter_by_token = TokenTextSplitter(chunk_size=3000, chunk_overlap=100)
If this is not taken into account, errors exceeding the max token count are likely to occur when processing text in multiple languages.
I have tested the number of tokens used for the same family of patents, in different languages:
English (US10901237B2)=21823 (100%)
Simplified Chinese (CN112904591A)=30901 (142%)
Traditional Chinese (TW201940135A)=36530 (167%)
Korean (KR20190089752A)=42644 (195%)
Japanese (JP2019128599A)=51430 (236%)
Hey Gregory,
Thank you for the great series on YouTube. I have a question regarding the notebook 'Ask A Book Questions.ipynb' that you used to demonstrate querying some custom knowledge from PDF files.
In the 11th cell, you used a code to load the vectors into Pinecone:
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
Subsequently, you used docsearch
again in your query:
query = "What are examples of good data science teams?"
docs = docsearch.similarity_search(query, include_metadata=True)
My question is would this be using the index from Pinecone? In your example here, you've loaded the vector into Pinecone earlier so the data is already in docsearch
but for a use case where you would want to read the index directly without loading any documents from Pinecone, would you use from_existing_index
instead? E.g.:
docsearch = Pinecone.from_existing_index(pinecone_index_name, embeddings)
Hallo
If I try to execute the tutorial either on colab or local I always get the following error
NameError: name 'UnstructuredPDFLoader' is not defined
even if I install all packages as shown on https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/unstructured_file.html
Hi there, I was trying Ask a book question tutorial. However I was stuck in the third line
data = loader.load()
.
Do you have any idea why it says my document was not a zip file? It is loading a PDF actually.
here is the stacktrace:
Traceback (most recent call last):
File "/Users/serena/Documents/langchain-tutorials/data_generation/chatPDF.py", line 5, in <module>
data = loader.load()
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/document_loaders/unstructured.py", line 61, in load
elements = self._get_elements()
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/langchain/document_loaders/pdf.py", line 27, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/pdf.py", line 19, in <module>
from unstructured.partition.text import partition_text
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/text.py", line 16, in <module>
from unstructured.partition.text_type import (
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/partition/text_type.py", line 21, in <module>
from unstructured.nlp.tokenize import pos_tag, sent_tokenize, word_tokenize
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/nlp/tokenize.py", line 32, in <module>
_download_nltk_package_if_not_present(package_name, package_category)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/unstructured/nlp/tokenize.py", line 21, in _download_nltk_package_if_not_present
nltk.find(f"{package_category}/{package_name}")
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 555, in find
return find(modified_name, paths)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 542, in find
return ZipFilePathPointer(p, zipentry)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/compat.py", line 41, in _decorator
return init_func(*args, **kwargs)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 394, in __init__
zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/compat.py", line 41, in _decorator
return init_func(*args, **kwargs)
File "/Users/serena/Library/Python/3.9/lib/python/site-packages/nltk/data.py", line 935, in __init__
zipfile.ZipFile.__init__(self, filename)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1257, in __init__
self._RealGetContents()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
When I ran the code for 'With Streaming' in ChatAPI + LangChain Basics.ipynb, I encountered an error: 'cannot import name 'CallbackManager' from 'langchain.callbacks.base'.'
Upon further investigation in the LangChain documentation, I discovered that the package containing CallbackManager has been modified.
Your vectorstore store your embeddings (☝️) and make "the" easily searchable
I guess it should be: "Your vectorstore store your embeddings (☝️) and make "them" easily searchable" :)
Thanks
when i run level 3 map reduce code, i got an error like :
ValueError: OpenAIChat currently only supports single prompt, got
I think it comes form code:
output = summary_chain.run(docs)
i have search from the interenet, and still not find a solution, so how can i solved this.
my local environment:
Python 3.9.5 (default, May 18 2021, 12:31:01)
langchain 0.0.167
openai 0.27.6
I have had this model working and it is great but now im getting different error messages.
ValidationError Traceback (most recent call last)
Cell In[72], line 2
1 llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
----> 2 chain = load_qa_chain(llm, chain_type="stuff")
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/question_answering/init.py:218, in load_qa_chain(llm, chain_type, verbose, callback_manager, **kwargs)
213 if chain_type not in loader_mapping:
214 raise ValueError(
215 f"Got unsupported chain type: {chain_type}. "
216 f"Should be one of {loader_mapping.keys()}"
217 )
--> 218 return loader_mapping[chain_type](
219 llm, verbose=verbose, callback_manager=callback_manager, **kwargs
220 )
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/question_answering/init.py:63, in _load_stuff_chain(llm, prompt, document_variable_name, verbose, callback_manager, **kwargs)
54 def _load_stuff_chain(
55 llm: BaseLanguageModel,
56 prompt: Optional[BasePromptTemplate] = None,
(...)
60 **kwargs: Any,
61 ) -> StuffDocumentsChain:
62 _prompt = prompt or stuff_prompt.PROMPT_SELECTOR.get_prompt(llm)
---> 63 llm_chain = LLMChain(
64 llm=llm, prompt=prompt, verbose=verbose, callback_manager=callback_manager
65 )
66 # TODO: document prompt
67 return StuffDocumentsChain(
68 llm_chain=llm_chain,
69 document_variable_name=document_variable_name,
(...)
72 **kwargs,
73 )
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.init()
ValidationError: 1 validation error for LLMChain
prompt
none is not an allowed value (type=type_error.none.not_allowed)
ChatGPT4 response:
The error message indicates a ValidationError due to an invalid value for the prompt argument in the LLMChain constructor. Specifically, the error message states that None is not an allowed value for prompt. This error can occur if the prompt argument is not properly specified when creating an instance of the LLMChain class.
To fix this error, the prompt argument should be properly specified when creating an instance of the LLMChain class. This can be done by providing a valid value for the prompt argument that is not None. Additionally, the error message suggests that the value None is not an allowed value for the prompt argument, so it is important to consult the documentation or source code of the LLMChain class to determine what values are valid for the prompt argument.
Hello,
let me first of all say, you have created a great tutorial on how to create a Q&A engine for any pdf-document based knowledge base. I love it!!!
I setup a Google Colab notebook to replicate your tutorial and came across quite a few issues during the environment setup.
The following screenshot shows all setup tasks needed to make it run successfully on Google Colab. I hope other readers find this useful.
Hello lang-chain devs,
How do I report a mistake in documentation? I was not sure if this is the right forum for pointing this out.
The sentence in documentation should probably read:
A text embedding model takes a piece of text as input and returns a numerical representation of that text in the form of a list of floats.
The original documentation is missing "returns a".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.