msusazureaccelerators / azure-cognitive-search-azure-openai-accelerator Goto Github PK

This project forked from pablomarin/gpt-azure-search-engine

Virtual Assistant - GPT Smart Search Engine - Bot Framework + Azure OpenAI + Azure AI Search + Azure SQL + Bing API + Azure Document Intelligence + LangChain + CosmosDB

Home Page: https://gptsmartsearchapp.azurewebsites.net

License: MIT License

Shell 0.27% Python 16.78% Jupyter Notebook 77.93% Bicep 4.44% PowerShell 0.53% Dockerfile 0.06%

azure-cognitive-search azure-openai bot-framework-v4 gpt-4 virtual-assistant-ai form-recognizer vector-search-engine

azure-cognitive-search-azure-openai-accelerator's Introduction

3 or 5 days POC VBD powered by: Azure AI Search + Azure OpenAI + Bot Framework + Langchain + Azure SQL + CosmosDB + Bing Search API + Document Intelligence SDK

Your organization requires a Multi-Channel Smart Chatbot and a search engine capable of comprehending diverse types of data scattered across various locations. Additionally, the conversational chatbot should be able to provide answers to inquiries, along with the source and an explanation of how and where the answer was obtained. In other words, you want private and secured ChatGPT for your organization that can interpret, comprehend, and answer questions about your business data.

The goal of the POC is to show/prove the value of a GPT Virtual Assistant built with Azure Services, with your own data in your own environment. The deliverables are:

Backend Bot API built with Bot Framework and exposed to multiple channels (Web Chat, MS Teams, SMS, Email, Slack, etc)
Frontend web application with a Search and a Bot UI.

The repo is made to teach you step-by-step on how to build a OpenAI-based Smart Search Engine. Each Notebook builds on top of each other and ends in building the two applications.

For Microsoft FTEs: This is a customer funded VBD, below the assets for the delivery.

Item	Description	Link
VBD SKU Info and Datasheet	CSAM must dispatch it as "Customer Invested" against credits/hours of Unified Support Contract. Customer decides if 3 or 5 days.	ESXP SKU page
VBD Accreditation for CSAs	Links for CSAs to get the Accreditation needed to deliver the workshop	Link 1 , Link 2
VBD 3-5 day POC Asset (IP)	The MVP to be delivered (this GitHub repo)	Azure-Cognitive-Search-Azure-OpenAI-Accelerator
VBD Workshop Deck	The deck introducing and explaining the workshop	Intro AOAI GPT Azure Smart Search Engine Accelerator.pptx
CSA Training Video	2 Hour Training for Microsoft CSA's	POC VBD Training Recording (New video coming soon!)

Prerequisites Client 3-5 Days POC

Azure subscription
Accepted Application to Azure Open AI, including GPT-4. If customer does not have GPT-4 approved, Microsoft CSAs can lend theirs during the workshop
Microsoft members preferably to be added as Guests in clients Azure AD. If not possible, then customers can issue corporate IDs to Microsoft members
A Resource Group (RG) needs to be set for this Workshop POC, in the customer Azure tenant
The customer team and the Microsoft team must have Contributor permissions to this resource group so they can set everything up 2 weeks prior to the workshop
A storage account must be set in place in the RG.
Customer Data/Documents must be uploaded to the blob storage account, at least two weeks prior to the workshop date
A Multi-Tenant App Registration (Service Principal) must be created by the customer (save the Client Id and Secret Value).
Customer must provide the Microsoft Team , 10-20 questions (easy to hard) that they want the bot to respond correctly.
For IDE collaboration and standarization during workshop, AML compute instances with Jupyper Lab will be used, for this, Azure Machine Learning Workspace must be deployed in the RG
- Note: Please ensure you have enough core compute quota in your Azure Machine Learning workspace

Architecture

Flow

The user asks a question.
In the app, an OpenAI GPT-4 LLM uses a clever prompt to determine which source to use based on the user input
Five types of sources are available:
- 3a. Azure SQL Database - contains COVID-related statistics in the US.
- 3b. API Endpoints - RESTful OpenAPI 3.0 API containing up-to-date statistics about Covid.
- 3c. Azure Bing Search API - provides access to the internet allowing scenerios like: QnA on public websites .
- 3d. Azure AI Search - contains AI-enriched documents from Blob Storage:
  - 10,000 Arxiv Computer Science PDFs
  - 90,000 Covid publication abstracts
  - 5 lenghty PDF books
- 3f. CSV Tabular File - contains COVID-related statistics in the US.
- 3g. Kraken broker API for currencies
The app retrieves the result from the source and crafts the answer.
The tuple (Question and Answer) is saved to CosmosDB as persistent memory and for further analysis.
The answer is delivered to the user.

Demo

https://gptsmartsearchapp.azurewebsites.net/

🔧Features

100% Python.
Uses Azure Cognitive Services to index and enrich unstructured documents: OCR over images, Chunking and automated vectorization.
Uses Hybrid Search Capabilities of Azure AI Search to provide the best semantic answer (Text and Vector search combined).
Uses LangChain as a wrapper for interacting with Azure OpenAI , vector stores, constructing prompts and creating agents.
Multi-Lingual (ingests, indexes and understand any language)
Multi-Index -> multiple search indexes
Tabular Data Q&A with CSV files and SQL flavor Databases
Uses Azure AI Document Intelligence SDK (former Form Recognizer) to parse complex/large PDF documents
Uses Bing Search API to power internet searches and Q&A over public websites.
Connects to API Data sources by converting natural language questions to API calls.
Uses CosmosDB as persistent memory to save user's conversations.
Uses Streamlit to build the Frontend web application in python.
Uses Bot Framework and Bot Service to Host the Bot API Backend and to expose it to multiple channels including MS Teams.
Uses also LangServe/FastAPI to deploy an alternative backend API with streaming capabilites

Steps to Run the POC/Accelerator

Note: (Pre-requisite) You need to have an Azure OpenAI service already created

Fork this repo to your Github account.
In Azure OpenAI studio, deploy these models (older models than the ones stated below won't work):
- "gpt-35-turbo-1106 (or newer)"
- "gpt-4-turbo-1106 (or newer)"
- "text-embedding-ada-002 (or newer)"
Create a Resource Group where all the assets of this accelerator are going to be. Azure OpenAI can be in different RG or a different Subscription.
ClICK BELOW to create all the Azure Infrastructure needed to run the Notebooks (Azure AI Search, Cognitive Services, etc):

Note: If you have never created a Azure AI Services Multi-Service account before, please create one manually in the azure portal to read and accept the Responsible AI terms. Once this is deployed, delete this and then use the above deployment button.
Clone your Forked repo to your AML Compute Instance. If your repo is private, see below in Troubleshooting section how to clone a private repo.
Make sure you run the notebooks on a Python 3.10 conda enviroment or newer
Install the dependencies on your machine (make sure you do the below pip comand on the same conda environment that you are going to run the notebooks. For example, in AZML compute instance run:
```
conda activate azureml_py310_sdkv2
pip install -r ./common/requirements.txt
```
You might get some pip dependancies errors, but that is ok, the libraries were installed correctly regardless of the error.
Edit the file credentials.env with your own values from the services created in step 4.
- For BLOB_SAS_TOKEN and BLOB_CONNECTION_STRING. Go to Storage Account>Security + networking>Shared access signature>Generate SAS
Run the Notebooks in order. They build up on top of each other.

FAQs

Why use Azure AI Search engine to provide the context for the LLM and not fine tune the LLM instead?

A: Quoting the OpenAI documentation: "GPT-3 has been pre-trained on a vast amount of text from the open internet. When given a prompt with just a few examples, it can often intuit what task you are trying to perform and generate a plausible completion. This is often called "few-shot learning. Fine-tuning improves on few-shot learning by training on many more examples than can fit in the prompt, letting you achieve better results on a wide number of tasks. Once a model has been fine-tuned, you won't need to provide examples in the prompt anymore. This saves costs and enables lower-latency requests"

However, fine-tuning the model requires providing hundreds or thousands of Prompt and Completion tuples, which are essentially query-response samples. The purpose of fine-tuning is not to give the LLM knowledge of the company's data but to provide it with examples so it can perform tasks really well without requiring examples on every prompt.

There are cases where fine-tuning is necessary, such as when the examples contain proprietary data that should not be exposed in prompts or when the language used is highly specialized, as in healthcare, pharmacy, or other industries or use cases where the language used is not commonly found on the internet.

Troubleshooting

Steps to clone a private repo:

On your Terminal, Paste the text below, substituting in your GitHub email address. Generate a new SSH key.

ssh-keygen -t ed25519 -C "[email protected]"

Copy the SSH public key to your clipboard. Add a new SSH key.

cat ~/.ssh/id_ed25519.pub
# Then select and copy the contents of the id_ed25519.pub file
# displayed in the terminal to your clipboard

On GitHub, go to Settings-> SSH and GPG Keys-> New SSH Key
In the "Title" field, add a descriptive label for the new key. "AML Compute". In the "Key" field, paste your public key.
Clone your private repo

git clone [email protected]:YOUR-USERNAME/YOUR-REPOSITORY.git

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

azure-cognitive-search-azure-openai-accelerator's People

Contributors

Stargazers

Watchers

Forkers

kubasiak jacksonstaub dereknguyenio kundachaikatisha rrichaz1 jbstanley2004 geekjapan manhhungk9g sdonohoo jaseemkh bibhu250 kenichi-segawa marykolade qlycool random4213 matthewtatsch unicolepms markm555 fkhoshouei lapate coding-forge dcnsakthi kmcrae147 gyanisinha edloh1 pietheinstrengholt vladfeigin mkoivi-ms adamalidra fredgis nanducode philipp-hinderberger nukhetms w0lveri9 joel-domingo seabok ahmedsmostafa fsaleemm luisarb jrdodson mysubhransu captainquirky fjmeneses hkallion hkallionte michelcessi-dev balakreshnan sanstechnologies constancelee taziki mtc-sydney teamsdeveloper wagmihodl0 benatoliveira mariajoseflechas precious-adeniyi deekumar2019 frsl92 jopapeador18 subhadarshi2020 smallangi rolftesmer mtc-sydney kailani-h dustin-ng svich2 iamstoxe rmthurman ikanwal dreamlimit666 danonecraig vtdeke2001 villepuntanen luqmaniqbal88 truehand thommaa charlottemcpike1 farhad-heybati dennisfelczy jc324-cam vpatil-ms johnbilliris lovepaul msusazureaccelerators msmarti ketanvh johngunerli ayanchawlae puneethlg-lg federalcsumission nextdynamic georgeliang0821 drdavidda edopoku mauliksoni lu-bartoli anna-peter amerlatina heikagrant nselvan20

azure-cognitive-search-azure-openai-accelerator's Issues

The storage account and sample data not exist anymore

I am try to run the example code but look like the storage account have been disable. Can you please public the data so I can upload it into my storage account to test.

Thanks
huy

There was an error sending this message to your bot: HTTP status code ServiceUnavailable

It seems that my backend is not deployed correctly altough was following the instuctions given. Any ideas why this might be?

What is cognitive service multi-service used in 02-LoadCSV...?

Skillset definition for 02-LoadCSVOneToMany includes cognitiveServices configuration. Why? Neither embedding nor split skill requires it, at least according to documentation.
Embedding:
This skill is bound to Azure OpenAI and is charged at the existing Azure OpenAI pay-as-you go price.

SplitSkill:
This skill isn't bound to Azure AI services. It's non-billable and has no Azure AI services key requirement.

03-Quering-AOpenAI.ipynb cell 6: Code fails if there are no answers

Needs to be changed to

display(HTML('<h4>Top Answers</h4>'))

for index,search_results in agg_search_results.items():
    if '@search_answers' in search_results:
        for result in search_results['@search.answers']:
            if result['score'] > 0.5: # Show answers that are at least 50% of the max possible score=1
                display(HTML('<h5>' + 'Answer - score: ' + str(round(result['score'],2)) + '</h5>'))
                display(HTML(result['text']))

No module named 'openai.error'

when deploy the both backend and frontend application, there's error of "no module named openai.error. then the application canot work.
the details is as the following:

ModuleNotFoundError: No module named 'openai.error'
Traceback:
File "/tmp/8dbe04b52defefa/antenv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "/tmp/8dbe04b52defefa/pages/1_Search.py", line 8, in
from openai.error import OpenAIError

The bot's response was not as expected when using (docsearch)

I'm using Serve.py to run, I use the internal document search function (@docsearch). For questions that are not in the document, I expect it to answer that I don't know or that I don't have the information, but currently it returns results according to its knowledge. Can you help me solve this problem?

High costs due to Azure Cognitiv Services

Hi,
initiating the demo in particular the search index of Azure Cognitive Search creates high costs in our case 1300 $.

There is no warning or note. In my eyes this is a no go and untrustworthy.

Notebook 10: missing ./data/openapi.json

Hi folks, the notebook fails. It is probably best to return the old code:


url = 'https://disease.sh/apidocs/swagger_v3.json'
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    spec = response.json()
else:
    spec = None
    print(f"Failed to retrieve data: Status code {response.status_code}")

Backend app is not able to respond for SQL related question for @covidstats tool

Since the backend application is not asking for SQL Server Username, however in the templated it is trying to connect with SQL Server with administrator login, it is still not able to communicate and provide response on SQL queries.

The testing of connectivity and response is good while testing using notebook 8-Smart_Agend.ipynb, but when the backend App is deployed, it does not work for SQL.

SQL connection is getting lost in the middle to same session

SQL connection is getting lost in middle of session and getting below response

sql,

"I’m sorry for any confusion, but it seems there was a misunderstanding in my response. Without direct access to a live database or specific data ************** (related with question), I cannot provide the requested information. If you have access to a database and are looking for guidance on constructing a SQL query to retrieve this information, I can certainly help with that. Please let me know how I can assist you further.

Intial 2-3 question are getting response but after that connection is getting dropped.

pip install botbuilder-integration-aiohttp fails

Hi there,

I'm trying to setup this locally, but the requirements fail to install. It's the botbuilder-integration-aiohttp package that fails. See below. Any suggestion for this?

C:\GitHub\Azure-Cognitive-Search-Azure-OpenAI-Accelerator>pip install botbuilder-integration-aiohttp
Collecting botbuilder-integration-aiohttp
  Using cached botbuilder_integration_aiohttp-4.14.4-py3-none-any.whl (18 kB)
Collecting botbuilder-schema==4.14.4 (from botbuilder-integration-aiohttp)
  Using cached botbuilder_schema-4.14.4-py2.py3-none-any.whl (35 kB)
Collecting botframework-connector==4.14.4 (from botbuilder-integration-aiohttp)
  Using cached botframework_connector-4.14.4-py2.py3-none-any.whl (96 kB)
Collecting botbuilder-core==4.14.4 (from botbuilder-integration-aiohttp)
  Using cached botbuilder_core-4.14.4-py3-none-any.whl (114 kB)
Collecting yarl<=1.4.2 (from botbuilder-integration-aiohttp)
  Using cached yarl-1.4.2.tar.gz (163 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: aiohttp==3.8.4 in c:\python\lib\site-packages (from botbuilder-integration-aiohttp) (3.8.4)
Requirement already satisfied: attrs>=17.3.0 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (22.2.0)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (3.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (6.0.4)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (4.0.2)
Requirement already satisfied: frozenlist>=1.1.1 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (1.3.3)
Requirement already satisfied: aiosignal>=1.1.2 in c:\python\lib\site-packages (from aiohttp==3.8.4->botbuilder-integration-aiohttp) (1.3.1)
Collecting botframework-streaming==4.14.4 (from botbuilder-core==4.14.4->botbuilder-integration-aiohttp)
  Using cached botframework_streaming-4.14.4-py3-none-any.whl (41 kB)
Collecting jsonpickle<1.5,>=1.2 (from botbuilder-core==4.14.4->botbuilder-integration-aiohttp)
  Using cached jsonpickle-1.4.2-py2.py3-none-any.whl (36 kB)
Collecting msrest==0.6.* (from botbuilder-schema==4.14.4->botbuilder-integration-aiohttp)
  Using cached msrest-0.6.21-py2.py3-none-any.whl (85 kB)
Requirement already satisfied: urllib3<2.0.0 in c:\python\lib\site-packages (from botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (1.26.15)
Requirement already satisfied: PyJWT>=2.4.0 in c:\python\lib\site-packages (from botframework-connector==4.14.4->botbuilder-integration-aiohttp) (2.6.0)
Requirement already satisfied: msal==1.* in c:\python\lib\site-packages (from botframework-connector==4.14.4->botbuilder-integration-aiohttp) (1.21.0)
Requirement already satisfied: requests<3,>=2.0.0 in c:\python\lib\site-packages (from msal==1.*->botframework-connector==4.14.4->botbuilder-integration-aiohttp) (2.28.2)
Requirement already satisfied: cryptography<41,>=0.6 in c:\python\lib\site-packages (from msal==1.*->botframework-connector==4.14.4->botbuilder-integration-aiohttp) (39.0.2)
Requirement already satisfied: requests-oauthlib>=0.5.0 in c:\python\lib\site-packages (from msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (1.3.1)
Requirement already satisfied: isodate>=0.6.0 in c:\python\lib\site-packages (from msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (0.6.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\python\lib\site-packages (from msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (2022.12.7)
Requirement already satisfied: idna>=2.0 in c:\python\lib\site-packages (from yarl<=1.4.2->botbuilder-integration-aiohttp) (3.4)
Requirement already satisfied: cffi>=1.12 in c:\python\lib\site-packages (from cryptography<41,>=0.6->msal==1.*->botframework-connector==4.14.4->botbuilder-integration-aiohttp) (1.15.1)
Requirement already satisfied: six in c:\python\lib\site-packages (from isodate>=0.6.0->msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (1.16.0)
Requirement already satisfied: oauthlib>=3.0.0 in c:\python\lib\site-packages (from requests-oauthlib>=0.5.0->msrest==0.6.*->botbuilder-schema==4.14.4->botbuilder-integration-aiohttp) (3.2.2)
Requirement already satisfied: pycparser in c:\python\lib\site-packages (from cffi>=1.12->cryptography<41,>=0.6->msal==1.*->botframework-connector==4.14.4->botbuilder-integration-aiohttp) (2.21)
Building wheels for collected packages: yarl
  Building wheel for yarl (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for yarl (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [54 lines of output]
      C:\Users\pstrengholt\AppData\Local\Temp\pip-build-env-0y5swes6\overlay\Lib\site-packages\setuptools\config\setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg`
      !!

              ********************************************************************************
              The license_file parameter is deprecated, use license_files instead.

              By 2023-Oct-30, you need to update your project and remove deprecated calls
              or your builds will no longer be supported.

              See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
              ********************************************************************************

      !!
        parsed = self.parsers.get(option_name, lambda x: x)(value)
      **********************
      * Accellerated build *
      **********************
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-311
      creating build\lib.win-amd64-cpython-311\yarl
      copying yarl\quoting.py -> build\lib.win-amd64-cpython-311\yarl
      copying yarl\__init__.py -> build\lib.win-amd64-cpython-311\yarl
      running egg_info
      writing yarl.egg-info\PKG-INFO
      writing dependency_links to yarl.egg-info\dependency_links.txt
      writing requirements to yarl.egg-info\requires.txt
      writing top-level names to yarl.egg-info\top_level.txt
      reading manifest file 'yarl.egg-info\SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no previously-included files matching '*.pyc' found anywhere in distribution
      warning: no previously-included files matching '*.cache' found anywhere in distribution
      warning: no previously-included files found matching 'yarl\_quoting.html'
      warning: no previously-included files found matching 'yarl\_quoting.*.so'
      warning: no previously-included files found matching 'yarl\_quoting.pyd'
      warning: no previously-included files found matching 'yarl\_quoting.*.pyd'
      no previously-included directories found matching 'docs\_build'
      adding license file 'LICENSE'
      writing manifest file 'yarl.egg-info\SOURCES.txt'
      copying yarl\__init__.pyi -> build\lib.win-amd64-cpython-311\yarl
      copying yarl\_quoting.c -> build\lib.win-amd64-cpython-311\yarl
      copying yarl\_quoting.pyx -> build\lib.win-amd64-cpython-311\yarl
      copying yarl\py.typed -> build\lib.win-amd64-cpython-311\yarl
      running build_ext
      building 'yarl._quoting' extension
      creating build\temp.win-amd64-cpython-311
      creating build\temp.win-amd64-cpython-311\Release
      creating build\temp.win-amd64-cpython-311\Release\yarl
      "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Python\include -IC:\Python\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /Tcyarl/_quoting.c /Fobuild\temp.win-amd64-cpython-311\Release\yarl/_quoting.obj
      _quoting.c
      yarl/_quoting.c(196): fatal error C1083: Cannot open include file: 'longintrepr.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.36.32532\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for yarl
Failed to build yarl
ERROR: Could not build wheels for yarl, which is required to install pyproject.toml-based projects

C:\GitHub\Azure-Cognitive-Search-Azure-OpenAI-Accelerator>

http://localhost:3978/api/messages request body

I wanna test it by using POST API
I started app.py file and I turned on localhost:3978
But when I tried to send POST API, I got this error
'raise DeserializationError("Cannot deserialize content-type: {}".format(content_type))
msrest.exceptions.DeserializationError: Cannot deserialize content-type: text/plain'

I think my request-body form is wrong.
I referred to the link bleow. But It didn't work
https://learn.microsoft.com/en-us/azure/bot-service/rest-api/bot-framework-rest-connector-api-reference?view=azure-bot-service-4.0
Can I get some request-body example??
Plz inform me if anybody knows about it

How do I bring the CSVTabularTool in the frontend

I understand that in the demo it uses the sql connection to query of the tabular data. However, I'd like to include the CSVTabularTool as well.

I did the following:

in the bot.py i included the following:
file_url = "testdata.csv"
csv_search = CSVTabularTool(path=file_url, llm=llm, callback_manager=cb_manager, return_direct=True)
```
  and updated the tools list: 
  tools = [www_search,csv_search, sql_search, doc_search, chatgpt_search, book_search]
```

the corresponding testdata.csv i stored in the same folder structure.

When i deployed this and try @csvfile then i have the following output:

Notebook 11. ValueError: An output parsing error occurred.

Hey folks, I keep getting parsing errors for the csv agent in Notebook 11, despite setting the model to gpt4 and adding the handling_parsing_errors=True parameter in the utils.py:451

It is probably related to some regression in the langchain, I wasn't able to pin the root cause yet.


ValueError: An output parsing error occurred. In order to pass this error back to the agent and have it try again, pass `handle_parsing_errors=True` to the AgentExecutor. This is the error: Could not parse tool input: {'arguments': "import pandas as pd\n\ndf = pd.read_csv('covid_data.csv')\nrows = df.shape[0]\nrows", 'name': 'python'} because the `arguments` is not valid JSON.

Notebook 4: The text-ada-embedding-002 is hardcoded in the index Vectorizer

Right now the vectorizer definition is hardcoding the ada embedding, and it breaks when you use text-embedding-small-3, for instance. Should be

        "vectorizers": [
            {
                "name": "openai",
                "kind": "azureOpenAI",
                "azureOpenAIParameters":
                {
                    "resourceUri" : os.environ['AZURE_OPENAI_ENDPOINT'],
                    "apiKey" : os.environ['AZURE_OPENAI_API_KEY'],
                    "deploymentId" : os.environ['EMBEDDING_DEPLOYMENT_NAME']
                }
            }
        ],

code Understanding

Hi ,
I just want to get understanding of this code working and how front end to back end is reacting and want to understand about how front end is taking part in displaying results?

06-SQLDB_QA broken: SQLDatabaseChain moved to langchain_experimental

Based on https://github.com/langchain-ai/langchain/blob/master/MIGRATE.md, SQLDatabaseChain has been moved to langchain_experimental.

I can get the correct result with the Notebook 6(the first RAG), but there's error with the web search page

I can get the correct result with the Notebook 6(the first RAG), everything is good.
but from the web search page, I always got the error:

Not data returned from Azure Search, check connection..

'value'

anyone got the same issue?

Current SQLSearch don't work with Synapse creating a wrong URL connection

The usage of the current setup of SQLAlchemy creates a wrong URL to connect to Synapse

Unauthorized issue to send request to 'http://localhost:3978/api/messages'

I run local server by using 'python app.py'.
So I opened my local server and I wannna send request to http://localhost:3978/api/messages.

I generated access token for botframework api by using https://login.microsoftonline.com/botframework.com/oauth2/v2.0/token this api.
Content-Type: application/x-www-form-urlencoded
grant_type=client_credentials
client_id=MY_MICROSOFT-APP-ID
client_secret=MY_MICROSOFT-APP-PASSWORD(Client_Secret)
scope=https://api.botframework.com/.default

Access Token was issued and I send request to http://localhost:3978/api/messages with issued token
But I got '401: Unauthorized' response.

How can I solve this problem??

Where should I set the table_info parameter?

Hello. In Prompts.py around line 294, there is a variable called table_info, for designating which tables the agent should search. It is not obvious in your code where this variable should be set.

Getting error pydantic.error_wrappers.ValidationError: 1 validation error for AIMessage, when trying to interact in live chat with @docsearch after deploying in web app.

The bot works fine locally with @docsearch , however when running in web app i am getting error

2023-10-30T11:26:48.002567988Z
2023-10-30T11:26:48.002588789Z [on_turn_error] unhandled error: 1 validation error for AIMessage
2023-10-30T11:26:48.002594489Z content
2023-10-30T11:26:48.002624989Z none is not an allowed value (type=type_error.none.not_allowed)
2023-10-30T11:26:48.019385627Z Traceback (most recent call last):
2023-10-30T11:26:48.020082732Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/botbuilder/core/bot_adapter.py", line 128, in run_pipeline
2023-10-30T11:26:48.020100033Z return await self._middleware.receive_activity_with_status(
2023-10-30T11:26:48.020962740Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/botbuilder/core/middleware_set.py", line 69, in receive_activity_with_status
2023-10-30T11:26:48.020980040Z return await self.receive_activity_internal(context, callback)
2023-10-30T11:26:48.020985140Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/botbuilder/core/middleware_set.py", line 79, in receive_activity_internal
2023-10-30T11:26:48.020989140Z return await callback(context)
2023-10-30T11:26:48.021769646Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/botbuilder/core/activity_handler.py", line 70, in on_turn
2023-10-30T11:26:48.021803747Z await self.on_message_activity(turn_context)
2023-10-30T11:26:48.037765778Z File "/tmp/8dbd938a08a3f74/bot.py", line 110, in on_message_activity
2023-10-30T11:26:48.037786278Z answer = await loop.run_in_executor(ThreadPoolExecutor(), run_agent, input_text, agent_chain)
2023-10-30T11:26:48.037864979Z File "/opt/python/3.10.12/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2023-10-30T11:26:48.037873179Z result = self.fn(*self.args, **self.kwargs)
2023-10-30T11:26:48.037876779Z File "/tmp/8dbd938a08a3f74/utils.py", line 485, in run_agent
2023-10-30T11:26:48.037880479Z x=agent_chain.run(input=question)
2023-10-30T11:26:48.037883779Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/chains/base.py", line 480, in run
2023-10-30T11:26:48.037887379Z return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
2023-10-30T11:26:48.037890679Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/chains/base.py", line 284, in call
2023-10-30T11:26:48.037894179Z final_outputs: Dict[str, Any] = self.prep_outputs(
2023-10-30T11:26:48.037897479Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in prep_outputs
2023-10-30T11:26:48.037900879Z self.memory.save_context(inputs, outputs)
2023-10-30T11:26:48.037904279Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/memory/chat_memory.py", line 38, in save_context
2023-10-30T11:26:48.037907779Z self.chat_memory.add_ai_message(output_str)
2023-10-30T11:26:48.037911079Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/schema/memory.py", line 108, in add_ai_message
2023-10-30T11:26:48.037914579Z self.add_message(AIMessage(content=message))
2023-10-30T11:26:48.037917879Z File "/tmp/8dbd938a08a3f74/antenv/lib/python3.10/site-packages/langchain/load/serializable.py", line 74, in init
2023-10-30T11:26:48.037921379Z super().init(**kwargs)
2023-10-30T11:26:48.037924679Z File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init
2023-10-30T11:26:48.037936279Z pydantic.error_wrappers.ValidationError: 1 validation error for AIMessage
2023-10-30T11:26:48.037939779Z content
2023-10-30T11:26:48.037942979Z none is not an allowed value (type=type_error.none.not_allowed)

the live chat works fine with @sqlsearch and @bing , only fails with @docsearch,
however @docsearch works fine locally.

what can be the issue here?

cannot import name 'update_vector_indexes' from 'utils

when I build the frontend web app with the latest repo, with the search page, I got the following error:
ImportError: cannot import name 'update_vector_indexes' from 'utils' (/tmp/8dc4360446125ac/utils.py)
could someone know what's the issue?

Missing import os in 07-BingChatClone.ipynb

When running notebook the os module is not imported.

"Properties is required for this resource" error when deploying bicep

The following resource is missing properties field and errors during deployment:

"{"code":"BadRequest","message":"Properties is required for this resouce."}"

https://github.com/MSUSAzureAccelerators/Azure-Cognitive-Search-Azure-OpenAI-Accelerator/blob/406038c0e8074b5b43cb995a21e4dcb39e1f1653/azuredeploy.bicep#L197C61-L197C69

The documentation shows the resource should have the following section:

resource formRecognizerAccount 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
  name: formRecognizerName
  location: location
  sku: {
    name: 'S0'
  }
  kind: 'FormRecognizer'
  properties: {}
}

https://learn.microsoft.com/en-us/azure/templates/microsoft.cognitiveservices/2023-05-01/accounts?pivots=deployment-language-bicep

https://demodatasetsp.blob.core.windows.net/cord19/metadata.csv don't exist anymore

Impossible to run 02-LoadCSVOneToMany-ACogSearch.ipynb notebook #2 as azure blob storage https://demodatasetsp.blob.core.windows.net/cord19/metadata.csv don't exist anymore

Slowness with latest update and brain agent required tool name with question

We’ve noticed that the latest updates have resulted in a decrease in performance speed. Our bot, which was developed using this code in conjunction with two different tools, employs GPT-4 with a capacity of 100 PTUs. Despite this, we’re experiencing response times of 30-40 seconds for both RAG search and SQL queries. Furthermore, we’ve detected that the absence of specific tool parameters leads to inadequate responses, an issue that wasn’t present before the recent updates.”

High costs

We have deployed this Repository using our own data. GPT-4-32k is the model that is used.

We figured that the cost of Azure OpenAI is extremely expensive when using the solution.
One example:
2 questions --> 1.80$

Even the Azure Cognitive Search is around 7$ a day and the frontend and backend app each around 15$ a day.

If more people in the organisation were about to use this tool it will get expensive very fast..

Is there an idea on how to lowen the cost of using this solution?

Thank you in advance :)

Blob search credentials do not work

Hi folks, I ran into an issue because the connection string is not working, the indexer indexes 0 files. I changed the connection string removing all endpoints but the Blob, and now it is working. I'll send a pull request

Request for Meeting to Demo Azure-Cognitive-Search-Azure-OpenAI-Accelerator

Dear Sir

I hope this finds you well. I recently came across the innovative solution? namely the Azure-Cognitive-Search-Azure-OpenAI-Accelerator. I am truly impressed by the potential this solution holds.

To gain a deeper understanding of the functionalities and capabilities of your Azure-Cognitive-Search-Azure-OpenAI-Accelerator, I would like to request a meeting for a personalized demonstration. I'm looking to develop this kind of solution in Azure.

Could i arrange a demonstration of your solution at our premises or through an online conference? I'm flexible regarding the format, and i'm willing to accommodate your availability.

Thank you in advance for considering my request. I am eager to learn more about your solution and explore how it could benefit our company. Please feel free to reach me via email to discuss the details and scheduling of the meeting.

Looking forward to your response. Have a great day!

Best regards,

Notebook 4: wrong model for embedder

Hi!

In Notebook 4, it should be:

embedder = AzureOpenAIEmbeddings(deployment=os.environ["EMBEDDING_DEPLOYMENT_NAME"], chunk_size=1)

instead of

embedder = AzureOpenAIEmbeddings(deployment=os.environ["GPT35_DEPLOYMENT_NAME"], chunk_size=1)

conda not found when using devcontainer

When using the devcontainer conda appears to not be installed. I followed the instructions here:

https://docs.conda.io/projects/miniconda/en/latest/#quick-command-line-install

Followed by

conda create -n azureml_py310_sdk2
conda activate azureml_py310_sdk2

conda appears to be an environment management framework, and if this isn't required please add a note in the documentation.

1milliondocuments

contains AI-enriched documents from Blob Storage (10k PDFs and 90k articles).
This acclerator is already working with 10k pdf and 90k documents , is it supportive on 1 million documents, please cofirm would proceed

the demo does not seem to have any contextual capability for chatgpt

When I ask chatGPT a follow-up question, it doesn't seem to have the ability to maintain context. well, I understand that This should indeed be a feature of GPT, at least it should be able to remember a certain number of historical questions.
for example: if I asked help to list 10 news websites, it worked well, when I asked "need more", it didn't know what shoud do at all.

Bing Search creation failed with error message {"code":"InternalServerError","message":"Specified resource could not be retrieved"}

Using Accelerator option when trying to deploy the bing search creation failed with error message {"code":"InternalServerError","message":"Specified resource could not be retrieved"}. Look screenshot for more information. Not sure if i have to register any namespace for target subscription. Please advice.

Notebook 8 issue: Pandas 2+ and SQLAlchemy 1.4 incompatibility

The requirements.txt specify sqlalchemy <2.0.0 but pandas without a version. That causes compatibility issues. I have rewritten the code to be compatible with sqlalchemy 2.0, and I can make a pull request if you agree.

attempted relative import with no known parent package

While running utils.py file, I am getting below error:

attempted relative import with no known parent package.