Code Monkey home page Code Monkey logo

Comments (8)

boerninator avatar boerninator commented on August 20, 2024 1

I also get this error with Azure

from scrapegraph-ai.

marcantoinefortier avatar marcantoinefortier commented on August 20, 2024 1

@mingjun1120 I unfortunately haven't tried with versions below 1.7.0. I can't suggest any workaround for now. 😅

from scrapegraph-ai.

mingjun1120 avatar mingjun1120 commented on August 20, 2024

I am also facing this error now. Below is my code:

import os
import json
from typing import List
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from scrapegraphai.graphs import SmartScraperGraph

load_dotenv()

# Define the output schema for the graph
class FAQLink(BaseModel):
    text: str = Field(description="The text of the link")
    url: str = Field(description="The URL of the link")

class FAQCategory(BaseModel):
    header: str = Field(description="The header of the FAQ category")
    links: List[FAQLink] = Field(description="The list of links in this category")

class FAQStructure(BaseModel):
    categories: List[FAQCategory] = Field(description="The list of FAQ categories")

# Initialize the model instances
llm_model_instance = AzureChatOpenAI(
    openai_api_key = os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
    openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment = os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"],
)

embedder_model_instance = AzureOpenAIEmbeddings(
    openai_api_key = os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
    openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment = os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
)

graph_config = {
    "llm": {"model_instance": llm_model_instance},
    "embeddings": {"model_instance": embedder_model_instance}
}

# Create the SmartScraperGraph instance and run it
smart_scraper_graph = SmartScraperGraph(
    prompt="Extract all FAQ categories, their headers, and the links (text and URL) within each category from the CIMB bank FAQ page",
    source="https://www.cimb.com.my/en/personal/help-support/faq.html",
    schema=FAQStructure,
    config=graph_config
)

result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

from scrapegraph-ai.

marcantoinefortier avatar marcantoinefortier commented on August 20, 2024

Using the code provided in the Azure example from the documentation results in this error with versions between 1.7.0 and 1.8.0.

from scrapegraph-ai.

mingjun1120 avatar mingjun1120 commented on August 20, 2024

Using the code provided in the Azure example from the documentation results in this error with versions between 1.7.0 and 1.8.0.

So, we should install the version that is below 1.7.0?

from scrapegraph-ai.

f-aguzzi avatar f-aguzzi commented on August 20, 2024

The official example for Azure seems very off to me for some reason. It's completely different from any other SmartScraper example for the other API providers. It's weird that the Langchain classes are accessed directly, skipping a piece of the usual Scrapegraph workflow.

Try building from this example (I edited the official example to give it the usual Scrapegraph-style structure, but I have no idea if it will work, because I don't have access to Azure to test it):

""" 
Basic example of scraping pipeline using SmartScraper using Azure OpenAI Key
"""

import os
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info


# required environment variable in .env
# AZURE_OPENAI_KEY


graph_config = {
    "llm": {
        "api_key": os.environ["AZURE_OPENAI_KEY"],
        "model": "azure/gpt-3.5-turbo",
    },
    "verbose": True,
    "headless": False
}

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the titles",
    source="https://sport.sky.it/nba?gr=www",
    config=graph_config
)

smart_scraper_graph = SmartScraperGraph(
    prompt="""List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time, 
    event_end_date, event_end_time, location, event_mode, event_category, 
    third_party_redirect, no_of_days, 
    time_in_hours, hosted_or_attending, refreshments_type, 
    registration_available, registration_link""",
    # also accepts a string with the already downloaded HTML code
    source="https://www.hmhco.com/event",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

Could be completely wrong, but if it works, we'll put it in place instead of the current example.

from scrapegraph-ai.

koushik-27 avatar koushik-27 commented on August 20, 2024

We are getting the same error with huggingface models as well with scrapegraphai (1.9.1)
Here is the code: (HF token is masked)

from langchain_huggingface import HuggingFaceEndpoint, HuggingFaceEndpointEmbeddings
from scrapegraphai.graphs import SmartScraperGraph

llm_model = HuggingFaceEndpoint(
repo_id="mistralai/Mistral-7B-Instruct-v0.2",
huggingfacehub_api_token="hf_xxxx"
)

embedder_model = HuggingFaceEndpointEmbeddings(
huggingfacehub_api_token="hf_xxxx",
model="sentence-transformers/all-MiniLM-l6-v2")

graph_config = {
"llm": {
"model_instance": llm_model,
},
"embeddings": {
"model_instance": embedder_model,
},
"headless": True
}

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the articles",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects",
config=graph_config,
)

result = smart_scraper_graph.run()
print(result)

Error:

(venv) koushikvoggu@Koushiks-MBP-2 AI % /Users/koushikvoggu/Stk-Training/AI/python/venv/bin/python /Users/koushikvoggu/Stk-Training/AI/test_langchain.py
The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /Users/koushikvoggu/.cache/huggingface/token
Login successful
Traceback (most recent call last):
File "/Users/koushikvoggu/Stk-Training/AI/test_langchain.py", line 450, in
smart_scraper_graph = SmartScraperGraph(
^^^^^^^^^^^^^^^^^^
File "/Users/koushikvoggu/Stk-Training/AI/python/venv/lib/python3.12/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 53, in init
super().init(prompt, config, source, schema)
File "/Users/koushikvoggu/Stk-Training/AI/python/venv/lib/python3.12/site-packages/scrapegraphai/graphs/abstract_graph.py", line 87, in init
self.graph = self._create_graph()
^^^^^^^^^^^^^^^^^^^^
File "/Users/koushikvoggu/Stk-Training/AI/python/venv/lib/python3.12/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 78, in _create_graph
"chunk_size": self.model_token
^^^^^^^^^^^^^^^^
AttributeError: 'SmartScraperGraph' object has no attribute 'model_token'

from scrapegraph-ai.

f-aguzzi avatar f-aguzzi commented on August 20, 2024

I just realized that this issue was fixed along with the repairs that VinciGit made for #434, so it can be closed.

from scrapegraph-ai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.