Code Monkey home page Code Monkey logo

msusazureaccelerators / azure-cognitive-search-azure-openai-accelerator Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pablomarin/gpt-azure-search-engine

296.0 25.0 510.0 35.36 MB

Virtual Assistant - GPT Smart Search Engine - Bot Framework + Azure OpenAI + Azure AI Search + Azure SQL + Bing API + Azure Document Intelligence + LangChain + CosmosDB

Home Page: https://gptsmartsearchapp.azurewebsites.net

License: MIT License

Shell 0.27% Python 16.78% Jupyter Notebook 77.93% Bicep 4.44% PowerShell 0.53% Dockerfile 0.06%
azure-cognitive-search azure-openai bot-framework-v4 gpt-4 virtual-assistant-ai form-recognizer vector-search-engine

azure-cognitive-search-azure-openai-accelerator's Introduction

image

3 or 5 days POC VBD powered by: Azure AI Search + Azure OpenAI + Bot Framework + Langchain + Azure SQL + CosmosDB + Bing Search API + Document Intelligence SDK

Open in GitHub Codespaces Open in VS Code Dev Containers

Your organization requires a Multi-Channel Smart Chatbot and a search engine capable of comprehending diverse types of data scattered across various locations. Additionally, the conversational chatbot should be able to provide answers to inquiries, along with the source and an explanation of how and where the answer was obtained. In other words, you want private and secured ChatGPT for your organization that can interpret, comprehend, and answer questions about your business data.

The goal of the POC is to show/prove the value of a GPT Virtual Assistant built with Azure Services, with your own data in your own environment. The deliverables are:

  1. Backend Bot API built with Bot Framework and exposed to multiple channels (Web Chat, MS Teams, SMS, Email, Slack, etc)
  2. Frontend web application with a Search and a Bot UI.

The repo is made to teach you step-by-step on how to build a OpenAI-based Smart Search Engine. Each Notebook builds on top of each other and ends in building the two applications.

For Microsoft FTEs: This is a customer funded VBD, below the assets for the delivery.

Item Description Link
VBD SKU Info and Datasheet CSAM must dispatch it as "Customer Invested" against credits/hours of Unified Support Contract. Customer decides if 3 or 5 days. ESXP SKU page
VBD Accreditation for CSAs Links for CSAs to get the Accreditation needed to deliver the workshop Link 1 , Link 2
VBD 3-5 day POC Asset (IP) The MVP to be delivered (this GitHub repo) Azure-Cognitive-Search-Azure-OpenAI-Accelerator
VBD Workshop Deck The deck introducing and explaining the workshop Intro AOAI GPT Azure Smart Search Engine Accelerator.pptx
CSA Training Video 2 Hour Training for Microsoft CSA's POC VBD Training Recording (New video coming soon!)

Prerequisites Client 3-5 Days POC

  • Azure subscription
  • Accepted Application to Azure Open AI, including GPT-4. If customer does not have GPT-4 approved, Microsoft CSAs can lend theirs during the workshop
  • Microsoft members preferably to be added as Guests in clients Azure AD. If not possible, then customers can issue corporate IDs to Microsoft members
  • A Resource Group (RG) needs to be set for this Workshop POC, in the customer Azure tenant
  • The customer team and the Microsoft team must have Contributor permissions to this resource group so they can set everything up 2 weeks prior to the workshop
  • A storage account must be set in place in the RG.
  • Customer Data/Documents must be uploaded to the blob storage account, at least two weeks prior to the workshop date
  • A Multi-Tenant App Registration (Service Principal) must be created by the customer (save the Client Id and Secret Value).
  • Customer must provide the Microsoft Team , 10-20 questions (easy to hard) that they want the bot to respond correctly.
  • For IDE collaboration and standarization during workshop, AML compute instances with Jupyper Lab will be used, for this, Azure Machine Learning Workspace must be deployed in the RG
    • Note: Please ensure you have enough core compute quota in your Azure Machine Learning workspace

Architecture

Architecture

Flow

  1. The user asks a question.
  2. In the app, an OpenAI GPT-4 LLM uses a clever prompt to determine which source to use based on the user input
  3. Five types of sources are available:
    • 3a. Azure SQL Database - contains COVID-related statistics in the US.
    • 3b. API Endpoints - RESTful OpenAPI 3.0 API containing up-to-date statistics about Covid.
    • 3c. Azure Bing Search API - provides access to the internet allowing scenerios like: QnA on public websites .
    • 3d. Azure AI Search - contains AI-enriched documents from Blob Storage:
      • 10,000 Arxiv Computer Science PDFs
      • 90,000 Covid publication abstracts
      • 5 lenghty PDF books
    • 3f. CSV Tabular File - contains COVID-related statistics in the US.
    • 3g. Kraken broker API for currencies
  4. The app retrieves the result from the source and crafts the answer.
  5. The tuple (Question and Answer) is saved to CosmosDB as persistent memory and for further analysis.
  6. The answer is delivered to the user.

Demo

https://gptsmartsearchapp.azurewebsites.net/


🔧Features

  • 100% Python.
  • Uses Azure Cognitive Services to index and enrich unstructured documents: OCR over images, Chunking and automated vectorization.
  • Uses Hybrid Search Capabilities of Azure AI Search to provide the best semantic answer (Text and Vector search combined).
  • Uses LangChain as a wrapper for interacting with Azure OpenAI , vector stores, constructing prompts and creating agents.
  • Multi-Lingual (ingests, indexes and understand any language)
  • Multi-Index -> multiple search indexes
  • Tabular Data Q&A with CSV files and SQL flavor Databases
  • Uses Azure AI Document Intelligence SDK (former Form Recognizer) to parse complex/large PDF documents
  • Uses Bing Search API to power internet searches and Q&A over public websites.
  • Connects to API Data sources by converting natural language questions to API calls.
  • Uses CosmosDB as persistent memory to save user's conversations.
  • Uses Streamlit to build the Frontend web application in python.
  • Uses Bot Framework and Bot Service to Host the Bot API Backend and to expose it to multiple channels including MS Teams.
  • Uses also LangServe/FastAPI to deploy an alternative backend API with streaming capabilites

Steps to Run the POC/Accelerator

Note: (Pre-requisite) You need to have an Azure OpenAI service already created

  1. Fork this repo to your Github account.

  2. In Azure OpenAI studio, deploy these models (older models than the ones stated below won't work):

    • "gpt-35-turbo-1106 (or newer)"
    • "gpt-4-turbo-1106 (or newer)"
    • "text-embedding-ada-002 (or newer)"
  3. Create a Resource Group where all the assets of this accelerator are going to be. Azure OpenAI can be in different RG or a different Subscription.

  4. ClICK BELOW to create all the Azure Infrastructure needed to run the Notebooks (Azure AI Search, Cognitive Services, etc):

    Deploy To Azure

    Note: If you have never created a Azure AI Services Multi-Service account before, please create one manually in the azure portal to read and accept the Responsible AI terms. Once this is deployed, delete this and then use the above deployment button.

  5. Clone your Forked repo to your AML Compute Instance. If your repo is private, see below in Troubleshooting section how to clone a private repo.

  6. Make sure you run the notebooks on a Python 3.10 conda enviroment or newer

  7. Install the dependencies on your machine (make sure you do the below pip comand on the same conda environment that you are going to run the notebooks. For example, in AZML compute instance run:

    conda activate azureml_py310_sdkv2
    pip install -r ./common/requirements.txt
    

    You might get some pip dependancies errors, but that is ok, the libraries were installed correctly regardless of the error.

  8. Edit the file credentials.env with your own values from the services created in step 4.

    • For BLOB_SAS_TOKEN and BLOB_CONNECTION_STRING. Go to Storage Account>Security + networking>Shared access signature>Generate SAS
  9. Run the Notebooks in order. They build up on top of each other.


FAQs

FAQs

  1. Why use Azure AI Search engine to provide the context for the LLM and not fine tune the LLM instead?

A: Quoting the OpenAI documentation: "GPT-3 has been pre-trained on a vast amount of text from the open internet. When given a prompt with just a few examples, it can often intuit what task you are trying to perform and generate a plausible completion. This is often called "few-shot learning. Fine-tuning improves on few-shot learning by training on many more examples than can fit in the prompt, letting you achieve better results on a wide number of tasks. Once a model has been fine-tuned, you won't need to provide examples in the prompt anymore. This saves costs and enables lower-latency requests"

However, fine-tuning the model requires providing hundreds or thousands of Prompt and Completion tuples, which are essentially query-response samples. The purpose of fine-tuning is not to give the LLM knowledge of the company's data but to provide it with examples so it can perform tasks really well without requiring examples on every prompt.

There are cases where fine-tuning is necessary, such as when the examples contain proprietary data that should not be exposed in prompts or when the language used is highly specialized, as in healthcare, pharmacy, or other industries or use cases where the language used is not commonly found on the internet.

Troubleshooting

Troubleshooting

Steps to clone a private repo:

ssh-keygen -t ed25519 -C "[email protected]"
cat ~/.ssh/id_ed25519.pub
# Then select and copy the contents of the id_ed25519.pub file
# displayed in the terminal to your clipboard
  • On GitHub, go to Settings-> SSH and GPG Keys-> New SSH Key
  • In the "Title" field, add a descriptive label for the new key. "AML Compute". In the "Key" field, paste your public key.
  • Clone your private repo
git clone [email protected]:YOUR-USERNAME/YOUR-REPOSITORY.git

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

azure-cognitive-search-azure-openai-accelerator's People

Contributors

amitmukh avatar dantelmomsft avatar erjadi avatar filipefumaux avatar giorgiosaez avatar github-actions[bot] avatar juliaashleyh8 avatar keayoub avatar lordaouy avatar maurominella avatar pablomarin avatar sethsteenken avatar tbunkley avatar tectonia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-cognitive-search-azure-openai-accelerator's Issues

03-Quering-AOpenAI.ipynb cell 6: Code fails if there are no answers

Needs to be changed to

display(HTML('<h4>Top Answers</h4>'))

for index,search_results in agg_search_results.items():
    if '@search_answers' in search_results:
        for result in search_results['@search.answers']:
            if result['score'] > 0.5: # Show answers that are at least 50% of the max possible score=1
                display(HTML('<h5>' + 'Answer - score: ' + str(round(result['score'],2)) + '</h5>'))
                display(HTML(result['text']))
            

No module named 'openai.error'

when deploy the both backend and frontend application, there's error of "no module named openai.error. then the application canot work.
the details is as the following:

ModuleNotFoundError: No module named 'openai.error'
Traceback:
File "/tmp/8dbe04b52defefa/antenv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "/tmp/8dbe04b52defefa/pages/1_Search.py", line 8, in
from openai.error import OpenAIError

The bot's response was not as expected when using (docsearch)

I'm using Serve.py to run, I use the internal document search function (@docsearch). For questions that are not in the document, I expect it to answer that I don't know or that I don't have the information, but currently it returns results according to its knowledge. Can you help me solve this problem?

High costs due to Azure Cognitiv Services

Hi,
initiating the demo in particular the search index of Azure Cognitive Search creates high costs in our case 1300 $.

There is no warning or note. In my eyes this is a no go and untrustworthy.

Notebook 10: missing ./data/openapi.json

Hi folks, the notebook fails. It is probably best to return the old code:


url = 'https://disease.sh/apidocs/swagger_v3.json'
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    spec = response.json()
else:
    spec = None
    print(f"Failed to retrieve data: Status code {response.status_code}")

Backend app is not able to respond for SQL related question for @covidstats tool

Since the backend application is not asking for SQL Server Username, however in the templated it is trying to connect with SQL Server with administrator login, it is still not able to communicate and provide response on SQL queries.

The testing of connectivity and response is good while testing using notebook 8-Smart_Agend.ipynb, but when the backend App is deployed, it does not work for SQL.

SQL connection is getting lost in the middle to same session

SQL connection is getting lost in middle of session and getting below response

sql,

"I’m sorry for any confusion, but it seems there was a misunderstanding in my response. Without direct access to a live database or specific data ************** (related with question), I cannot provide the requested information. If you have access to a database and are looking for guidance on constructing a SQL query to retrieve this information, I can certainly help with that. Please let me know how I can assist you further.

Intial 2-3 question are getting response but after that connection is getting dropped.

pip install botbuilder-integration-aiohttp fails

Hi there,

I'm trying to setup this locally, but the requirements fail to install. It's the botbuilder-integration-aiohttp package that fails. See below. Any suggestion for this?

C:\GitHub\Azure-Cognitive-Search-Azure-OpenAI-Accelerator>pip install botbuilder-integration-aiohttp
Collecting botbuilder-integration-aiohttp
  Using cached botbuilder_integration_aiohttp-4.14.4-py3-none-any.whl (18 kB)
Collecting botbuilder-schema==4.14.4 (from botbuilder-integration-aiohttp)
  Using cached botbuilder_schema-4.14.4-py2.py3-none-any.whl (35 kB)
Collecting botframework-connector==4.14.4 (from botbuilder-integration-aiohttp)
  Using cached botframework_connector-4.14.4-py2.py3-none-any.whl (96 kB)
Collecting botbuilder-core==4.14.4 (from botbuilder-integration-aiohttp)
  Using cached botbuilder_core-4.14.4-py3-none-any.whl (114 kB)
Collecting yarl<=1.4.2 (from botbuilder-integration-aiohttp)
  Using cached yarl-1.4.2.tar.gz (163 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: aiohttp==3.8.4 in c:\python\lib\site-packages (from botbuilder-integration-aiohttp) (3.8.4)
Requirement already satisfied: attrs>=17.3.0 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (22.2.0)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (3.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (6.0.4)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (4.0.2)
Requirement already satisfied: frozenlist>=1.1.1 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (1.3.3)
Requirement already satisfied: aiosignal>=1.1.2 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (1.3.1)
Collecting botframework-streaming==4.14.4 (from botbuilder-core==4.14.4->botbuilder-integration-aiohttp)
  Using cached botframework_streaming-4.14.4-py3-none-any.whl (41 kB)
Collecting jsonpickle<1.5,>=1.2 (from botbuilder-core==4.14.4->botbuilder-integration-aiohttp)
  Using cached jsonpickle-1.4.2-py2.py3-none-any.whl (36 kB)
Collecting msrest==0.6.* (from botbuilder-schema==4.14.4->botbuilder-integration-aiohttp)
  Using cached msrest-0.6.21-py2.py3-none-any.whl (85 kB)
Requirement already satisfied: urllib3<2.0.0 in c:\python\lib\site-packages (from botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (1.26.15)
Requirement already satisfied: PyJWT>=2.4.0 in c:\python\lib\site-packages (from botframework-connector==4.14.4->botbuilder-integration-aiohttp) (2.6.0)
Requirement already satisfied: msal==1.* in c:\python\lib\site-packages (from botframework-connector==4.14.4->botbuilder-integration-aiohttp) (1.21.0)
Requirement already satisfied: requests<3,>=2.0.0 in c:\python\lib\site-packages (from msal==1.*->botframework-connector==4.14.4->botbuilder-integration-aiohttp) (2.28.2)
Requirement already satisfied: cryptography<41,>=0.6 in c:\python\lib\site-packages (from msal==1.*->botframework-connector==4.14.4->botbuilder-integration-aiohttp) (39.0.2)
Requirement already satisfied: requests-oauthlib>=0.5.0 in c:\python\lib\site-packages (from msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (1.3.1)
Requirement already satisfied: isodate>=0.6.0 in c:\python\lib\site-packages (from msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (0.6.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\python\lib\site-packages (from msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (2022.12.7)
Requirement already satisfied: idna>=2.0 in c:\python\lib\site-packages (from yarl<=1.4.2->botbuilder-integration-aiohttp) (3.4)
Requirement already satisfied: cffi>=1.12 in c:\python\lib\site-packages (from cryptography<41,>=0.6->msal==1.*->botframework-connector==4.14.4->botbuilder-integration-aiohttp) (1.15.1)
Requirement already satisfied: six in c:\python\lib\site-packages (from isodate>=0.6.0->msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (1.16.0)
Requirement already satisfied: oauthlib>=3.0.0 in c:\python\lib\site-packages (from requests-oauthlib>=0.5.0->msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (3.2.2)
Requirement already satisfied: pycparser in c:\python\lib\site-packages (from cffi>=1.12->cryptography<41,>=0.6->msal==1.*->botframework-connector==4.14.4->botbuilder-integration-aiohttp) (2.21)
Building wheels for collected packages: yarl
  Building wheel for yarl (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for yarl (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [54 lines of output]
      C:\Users\pstrengholt\AppData\Local\Temp\pip-build-env-0y5swes6\overlay\Lib\site-packages\setuptools\config\setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg`
      !!

              ********************************************************************************
              The license_file parameter is deprecated, use license_files instead.

              By 2023-Oct-30, you need to update your project and remove deprecated calls
              or your builds will no longer be supported.

              See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
              ********************************************************************************

      !!
        parsed = self.parsers.get(option_name, lambda x: x)(value)
      **********************
      * Accellerated build *
      **********************
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-311
      creating build\lib.win-amd64-cpython-311\yarl
      copying yarl\quoting.py -> build\lib.win-amd64-cpython-311\yarl
      copying yarl\__init__.py -> build\lib.win-amd64-cpython-311\yarl
      running egg_info
      writing yarl.egg-info\PKG-INFO
      writing dependency_links to yarl.egg-info\dependency_links.txt
      writing requirements to yarl.egg-info\requires.txt
      writing top-level names to yarl.egg-info\top_level.txt
      reading manifest file 'yarl.egg-info\SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no previously-included files matching '*.pyc' found anywhere in distribution
      warning: no previously-included files matching '*.cache' found anywhere in distribution
      warning: no previously-included files found matching 'yarl\_quoting.html'
      warning: no previously-included files found matching 'yarl\_quoting.*.so'
      warning: no previously-included files found matching 'yarl\_quoting.pyd'
      warning: no previously-included files found matching 'yarl\_quoting.*.pyd'
      no previously-included directories found matching 'docs\_build'
      adding license file 'LICENSE'
      writing manifest file 'yarl.egg-info\SOURCES.txt'
      copying yarl\__init__.pyi -> build\lib.win-amd64-cpython-311\yarl
      copying yarl\_quoting.c -> build\lib.win-amd64-cpython-311\yarl
      copying yarl\_quoting.pyx -> build\lib.win-amd64-cpython-311\yarl
      copying yarl\py.typed -> build\lib.win-amd64-cpython-311\yarl
      running build_ext
      building 'yarl._quoting' extension
      creating build\temp.win-amd64-cpython-311
      creating build\temp.win-amd64-cpython-311\Release
      creating build\temp.win-amd64-cpython-311\Release\yarl
      "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Python\include -IC:\Python\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /Tcyarl/_quoting.c /Fobuild\temp.win-amd64-cpython-311\Release\yarl/_quoting.obj
      _quoting.c
      yarl/_quoting.c(196): fatal error C1083: Cannot open include file: 'longintrepr.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.36.32532\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for yarl
Failed to build yarl
ERROR: Could not build wheels for yarl, which is required to install pyproject.toml-based projects

C:\GitHub\Azure-Cognitive-Search-Azure-OpenAI-Accelerator>

http://localhost:3978/api/messages request body

I wanna test it by using POST API
I started app.py file and I turned on localhost:3978
But when I tried to send POST API, I got this error
'raise DeserializationError("Cannot deserialize content-type: {}".format(content_type))
msrest.exceptions.DeserializationError: Cannot deserialize content-type: text/plain'

I think my request-body form is wrong.
I referred to the link bleow. But It didn't work
https://learn.microsoft.com/en-us/azure/bot-service/rest-api/bot-framework-rest-connector-api-reference?view=azure-bot-service-4.0
Can I get some request-body example??
Plz inform me if anybody knows about it

How do I bring the CSVTabularTool in the frontend

I understand that in the demo it uses the sql connection to query of the tabular data. However, I'd like to include the CSVTabularTool as well.

I did the following:

  • in the bot.py i included the following:
    file_url = "testdata.csv"
    csv_search = CSVTabularTool(path=file_url, llm=llm, callback_manager=cb_manager, return_direct=True)

      and updated the tools list: 
      tools = [www_search,csv_search, sql_search, doc_search, chatgpt_search, book_search]
    

the corresponding testdata.csv i stored in the same folder structure.

When i deployed this and try @csvfile then i have the following output:
image

Notebook 11. ValueError: An output parsing error occurred.

Hey folks, I keep getting parsing errors for the csv agent in Notebook 11, despite setting the model to gpt4 and adding the handling_parsing_errors=True parameter in the utils.py:451

It is probably related to some regression in the langchain, I wasn't able to pin the root cause yet.


ValueError: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse tool input: {'arguments': "import pandas as pd\n\ndf = pd.read_csv('covid_data.csv')\nrows = df.shape[0]\nrows", 'name': 'python'} because the `arguments` is not valid JSON.

Notebook 4: The text-ada-embedding-002 is hardcoded in the index Vectorizer

Right now the vectorizer definition is hardcoding the ada embedding, and it breaks when you use text-embedding-small-3, for instance. Should be

        "vectorizers": [
            {
                "name": "openai",
                "kind": "azureOpenAI",
                "azureOpenAIParameters":
                {
                    "resourceUri" : os.environ['AZURE_OPENAI_ENDPOINT'],
                    "apiKey" : os.environ['AZURE_OPENAI_API_KEY'],
                    "deploymentId" : os.environ['EMBEDDING_DEPLOYMENT_NAME']
                }
            }
        ],

code Understanding

Hi ,
I just want to get understanding of this code working and how front end to back end is reacting and want to understand about how front end is taking part in displaying results?

Unauthorized issue to send request to 'http://localhost:3978/api/messages'

I run local server by using 'python app.py'.
So I opened my local server and I wannna send request to http://localhost:3978/api/messages.

I generated access token for botframework api by using https://login.microsoftonline.com/botframework.com/oauth2/v2.0/token this api.
Content-Type: application/x-www-form-urlencoded
grant_type=client_credentials
client_id=MY_MICROSOFT-APP-ID
client_secret=MY_MICROSOFT-APP-PASSWORD(Client_Secret)
scope=https://api.botframework.com/.default

Access Token was issued and I send request to http://localhost:3978/api/messages with issued token
But I got '401: Unauthorized' response.

How can I solve this problem??

Where should I set the table_info parameter?

Hello. In Prompts.py around line 294, there is a variable called table_info, for designating which tables the agent should search. It is not obvious in your code where this variable should be set.

Getting error pydantic.error_wrappers.ValidationError: 1 validation error for AIMessage, when trying to interact in live chat with @docsearch after deploying in web app.

The bot works fine locally with @docsearch , however when running in web app i am getting error

log in web app logstream:

2023-10-30T11:26:48.002567988Z
2023-10-30T11:26:48.002588789Z [on_turn_error] unhandled error: 1 validation error for AIMessage
2023-10-30T11:26:48.002594489Z content
2023-10-30T11:26:48.002624989Z none is not an allowed value (type=type_error.none.not_allowed)
2023-10-30T11:26:48.019385627Z Traceback (most recent call last):
2023-10-30T11:26:48.020082732Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/botbuilder/core/bot_adapter.py", line 128, in run_pipeline
2023-10-30T11:26:48.020100033Z return await self._middleware.receive_activity_with_status(
2023-10-30T11:26:48.020962740Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/botbuilder/core/middleware_set.py", line 69, in receive_activity_with_status
2023-10-30T11:26:48.020980040Z return await self.receive_activity_internal(context, callback)
2023-10-30T11:26:48.020985140Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/botbuilder/core/middleware_set.py", line 79, in receive_activity_internal
2023-10-30T11:26:48.020989140Z return await callback(context)
2023-10-30T11:26:48.021769646Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/botbuilder/core/activity_handler.py", line 70, in on_turn
2023-10-30T11:26:48.021803747Z await self.on_message_activity(turn_context)
2023-10-30T11:26:48.037765778Z File "/tmp/8dbd938a08a3f74/bot.py", line 110, in on_message_activity
2023-10-30T11:26:48.037786278Z answer = await loop.run_in_executor(ThreadPoolExecutor(), run_agent, input_text, agent_chain)
2023-10-30T11:26:48.037864979Z File "/opt/python/3.10.12/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2023-10-30T11:26:48.037873179Z result = self.fn(*self.args, **self.kwargs)
2023-10-30T11:26:48.037876779Z File "/tmp/8dbd938a08a3f74/utils.py", line 485, in run_agent
2023-10-30T11:26:48.037880479Z x=agent_chain.run(input=question)
2023-10-30T11:26:48.037883779Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/chains/base.py", line 480, in run
2023-10-30T11:26:48.037887379Z return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
2023-10-30T11:26:48.037890679Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/chains/base.py", line 284, in call
2023-10-30T11:26:48.037894179Z final_outputs: Dict[str, Any] = self.prep_outputs(
2023-10-30T11:26:48.037897479Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in prep_outputs
2023-10-30T11:26:48.037900879Z self.memory.save_context(inputs, outputs)
2023-10-30T11:26:48.037904279Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/memory/chat_memory.py", line 38, in save_context
2023-10-30T11:26:48.037907779Z self.chat_memory.add_ai_message(output_str)
2023-10-30T11:26:48.037911079Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/schema/memory.py", line 108, in add_ai_message
2023-10-30T11:26:48.037914579Z self.add_message(AIMessage(content=message))
2023-10-30T11:26:48.037917879Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/load/serializable.py", line 74, in init
2023-10-30T11:26:48.037921379Z super().init(**kwargs)
2023-10-30T11:26:48.037924679Z File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init
2023-10-30T11:26:48.037936279Z pydantic.error_wrappers.ValidationError: 1 validation error for AIMessage
2023-10-30T11:26:48.037939779Z content
2023-10-30T11:26:48.037942979Z none is not an allowed value (type=type_error.none.not_allowed)

the live chat works fine with @sqlsearch and @bing , only fails with @docsearch,
however @docsearch works fine locally.

image

what can be the issue here?

cannot import name 'update_vector_indexes' from 'utils

when I build the frontend web app with the latest repo, with the search page, I got the following error:
ImportError: cannot import name 'update_vector_indexes' from 'utils' (/tmp/8dc4360446125ac/utils.py)
could someone know what's the issue?

"Properties is required for this resource" error when deploying bicep

The following resource is missing properties field and errors during deployment:

"{"code":"BadRequest","message":"Properties is required for this resouce."}"

https://github.com/MSUSAzureAccelerators/Azure-Cognitive-Search-Azure-OpenAI-Accelerator/blob/406038c0e8074b5b43cb995a21e4dcb39e1f1653/azuredeploy.bicep#L197C61-L197C69

The documentation shows the resource should have the following section:

resource formRecognizerAccount 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
  name: formRecognizerName
  location: location
  sku: {
    name: 'S0'
  }
  kind: 'FormRecognizer'
  properties: {}
}

https://learn.microsoft.com/en-us/azure/templates/microsoft.cognitiveservices/2023-05-01/accounts?pivots=deployment-language-bicep

Slowness with latest update and brain agent required tool name with question

We’ve noticed that the latest updates have resulted in a decrease in performance speed. Our bot, which was developed using this code in conjunction with two different tools, employs GPT-4 with a capacity of 100 PTUs. Despite this, we’re experiencing response times of 30-40 seconds for both RAG search and SQL queries. Furthermore, we’ve detected that the absence of specific tool parameters leads to inadequate responses, an issue that wasn’t present before the recent updates.”

High costs

We have deployed this Repository using our own data. GPT-4-32k is the model that is used.

We figured that the cost of Azure OpenAI is extremely expensive when using the solution.
One example:
2 questions --> 1.80$

Even the Azure Cognitive Search is around 7$ a day and the frontend and backend app each around 15$ a day.

If more people in the organisation were about to use this tool it will get expensive very fast..

Is there an idea on how to lowen the cost of using this solution?

Thank you in advance :)

Blob search credentials do not work

Hi folks, I ran into an issue because the connection string is not working, the indexer indexes 0 files. I changed the connection string removing all endpoints but the Blob, and now it is working. I'll send a pull request

Request for Meeting to Demo Azure-Cognitive-Search-Azure-OpenAI-Accelerator

Dear Sir

I hope this finds you well. I recently came across the innovative solution? namely the Azure-Cognitive-Search-Azure-OpenAI-Accelerator. I am truly impressed by the potential this solution holds.

To gain a deeper understanding of the functionalities and capabilities of your Azure-Cognitive-Search-Azure-OpenAI-Accelerator, I would like to request a meeting for a personalized demonstration. I'm looking to develop this kind of solution in Azure.

Could i arrange a demonstration of your solution at our premises or through an online conference? I'm flexible regarding the format, and i'm willing to accommodate your availability.

Thank you in advance for considering my request. I am eager to learn more about your solution and explore how it could benefit our company. Please feel free to reach me via email to discuss the details and scheduling of the meeting.

Looking forward to your response. Have a great day!

Best regards,

Notebook 4: wrong model for embedder

Hi!

In Notebook 4, it should be:

embedder = AzureOpenAIEmbeddings(deployment=os.environ["EMBEDDING_DEPLOYMENT_NAME"], chunk_size=1) 

instead of

embedder = AzureOpenAIEmbeddings(deployment=os.environ["GPT35_DEPLOYMENT_NAME"], chunk_size=1) 

1milliondocuments

  • contains AI-enriched documents from Blob Storage (10k PDFs and 90k articles).
    This acclerator is already working with 10k pdf and 90k documents , is it supportive on 1 million documents, please cofirm would proceed

the demo does not seem to have any contextual capability for chatgpt

When I ask chatGPT a follow-up question, it doesn't seem to have the ability to maintain context. well, I understand that This should indeed be a feature of GPT, at least it should be able to remember a certain number of historical questions.
for example: if I asked help to list 10 news websites, it worked well, when I asked "need more", it didn't know what shoud do at all.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.