Comments (9)
Two notes:
- there's probably something in the request chunking module that got broken in a recent update. This is the second issue of this type (exceeding token size even when models are supported) in less than a week
- the OpenAI errors appear because DeepSeek is invoked through the OpenAI module in LangChain. This is because LangChain does not provide direct support for DeepSeek, but DeepSeek models have an OpenAI-like API
from scrapegraph-ai.
hi, when it overlap 32k our algorithm create another api call, it should not be a problem.
If you have errors please send us the script and we can take a look
from scrapegraph-ai.
hi, when it overlap 32k our algorithm create another api call, it should not be a problem. If you have errors please send us the script and we can take a look
Thanks for your help, Vinci.
The following is the script, it is case of a boundary condition when User's Input is smaller than while very near to 32K, and I specified the max_tokens to 4096. When max_tokens not specified, it outputs only parts of the expected links cause running out of tokens. When I specify the max_tokens to 4096, it runs with error:
File "/Users/david/.pyenv/versions/3.10.13/envs/scraper/lib/python3.10/site-packages/openai/_base_client.py", line 921, in request
return self._request(
File "/Users/david/.pyenv/versions/3.10.13/envs/scraper/lib/python3.10/site-packages/openai/_base_client.py", line 1020, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'detail': "This model's maximum context length is 32768 tokens. However, you requested 36317 tokens (32221 in the messages, 4096 in the completion). Please reduce the length of the messages or completion."}
The script:
import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
load_dotenv()
deepseek_key = os.getenv("DEEPSEEK_KEY")
graph_config = {
"llm": {
"api_key": deepseek_key,
"model": "deepseek-chat",
"temperature": 0.7,
"max_tokens": 4096, ## max output tokens limited to 4k for gpt-4o,gpt-4-turbo
"base_url": "https://api.deepseek.com/v1"
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"temperature": 0,
"base_url": "http://localhost:11434", # set ollama URL
},
"headless": False,
"verbose": True,
}
# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************
smart_scraper_graph = SmartScraperGraph(
prompt="extract all page title and links under,footage-card-wrapper as format of: [{\"title\": \"xxx\", \"link\":\"xxx\" }] ",
# also accepts a string with the already downloaded HTML code
source="https://stock.xinpianchang.com/footages/2997636.html",
config=graph_config
)
result = smart_scraper_graph.run()
print(result)
from scrapegraph-ai.
please update to the new version
from scrapegraph-ai.
please update to the new version
Thanks for your help. I updated to 1.7.3, and still got the same error:
openai.BadRequestError: Error code: 400 - {'detail': "This model's maximum context length is 32768 tokens. However, you requested 36294 tokens (32198 in the messages, 4096 in the completion). Please reduce the length of the messages or completion."}
from scrapegraph-ai.
why openai error? have you changed the provider?
from scrapegraph-ai.
The code has been pasted as above. The model I use is "deepseek-chat". I wonder what caused it to show openai errors.
from scrapegraph-ai.
I am experiencing the same issue using open ai models. As you said the chunking module is probably broken which is concerning since it results in an error.
from scrapegraph-ai.
Hi please update to the new beta version. If you still have this problem please reopen the issue
from scrapegraph-ai.
Related Issues (20)
- Default Prompt template customization HOT 3
- 'SmartScraperGraph' object has no attribute 'model_token' HOT 7
- Add Vertex AI Integration HOT 1
- SearchGraph error while follwing the example HOT 2
- Follow up prompts HOT 5
- 我该如何爬取需要登陆的页面? HOT 1
- Ollama JSON format is used for creating search query HOT 4
- The script smart_scraper_schema_azure.py from the example/azure directory cannot be executed because the 'SmartScraperGraph' object has no attribute 'model_token'. HOT 4
- Strange results HOT 2
- Unable to properly scrape certain web pages (i.e. large number or clients / products / office locations). HOT 3
- Incomplete Data Returned from OpenAI API Model HOT 1
- TypeError: Expected str, not <class 'pydantic.v1.types.SecretStr'> HOT 1
- Default burr project name is invalid HOT 1
- Issue with Extracting URLs Using ScrapeGraphAI in Flask Application HOT 3
- Stuck at "(updated chunks metadata)" HOT 3
- Dude...what are the supported Ollama models??? HOT 1
- does it support custom embeddings HOT 2
- Problem running the example case in SearchGraph
- Problem extracting urls and image urls using the FetchNode HOT 2
- Is "코리아노" a deliberate translation? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapegraph-ai.