Code Monkey home page Code Monkey logo

cohere-ai / quick-start-connectors Goto Github PK

View Code? Open in Web Editor NEW
121.0 10.0 20.0 11.96 MB

This open-source repository offers reference code for integrating workplace datastores with Cohere's LLMs, enabling developers and businesses to perform seamless retrieval-augmented generation (RAG) on their own data.

Home Page: https://docs.cohere.com/docs/connectors

License: MIT License

Dockerfile 0.39% Python 99.44% Perl 0.10% Shell 0.07%
connectors rag generative-ai llm

quick-start-connectors's People

Contributors

aalpat4 avatar danodonovan avatar dcarpintero avatar dependabot[bot] avatar eugenelightson avatar ofermend avatar tianjing-li avatar tomtobac avatar walterbm-cohere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

quick-start-connectors's Issues

[Confluence] HTML Strip

Which connector is affected?

confluence

What would you like to see improved?

This is both questions and suggestions:

  • Is cohere cleaning the HTML when sending to the API ?
  • If not, do you think removing things like styles from the HTML is advisable in the connector?
  • If HTML help with context it we may keep/simplify the tags, and remove styles, right?
  • The UI is showing the HTML , that should probably be removed

image

Thanks!

Additional information

No response

Sending additional parameters

Which connector is affected?

All sources that support filtering.

What would you like to see improved?

How possible is to send additional parameters for metadata filtering?

response = co.chat(  
	message="What is the chemical formula for glucose?",  
	connectors=[{"id": "my-connector", "params": {"some_field": "some_value"} }]  
)

The only way I can think of now is passing parameters on creation time:

created_connector = co.create_connector(
            name="Example connector",
            url="https://connector-example.com/search?some_field=some_value",
        )

But that's not very flexible.

Do you think calling the connector API directly with the filters, and then sending the results to the Cohere documents endpoint would do the trick?

curl --request POST  
    --url 'https://connector-example.com/search'
    --header 'Content-Type: application/json'  
    --data '{  
    "query": "How do I expense a meal?" ,
    "some_field": "some_value"
  }'

And then

            response = co.chat(
                message=message,
                documents=documents,
                conversation_id=self.conversation_id,
                stream=True,
            )

Is there a simpler way to achieve this filtering?

Thanks!

Additional information

No response

Can't create asana connector in Cohere

Which connector is affected?

Asana

What is the issue?

I'm able to hit my connector URL with postman:

curl --location 'https://<my_url>/search' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ey.....' \
--data '{
    "query": "design"
  }'

Now I'm trying to create it on Cohere with no luck:

import dotenv from "dotenv";
dotenv.config();

import { CohereClient } from "cohere-ai";

const cohere = new CohereClient({
  token: process.env.COHERE_API_KEY,
});


(async () => {
  const connector = await cohere.connectors.create({
    name: "asana-connector",
    url: "https://<my_url>/search",
    description: "Asana connector",
    service_auth: {
      type: "bearer",
      token: process.env.ASANA_CONNECTOR_TOKEN,
    },
  });

  console.log(connector);
})();

Both API keys are in place, I tried hardcoding them.

This is the error:

BadRequestError: BadRequestError
Status code: 400
Body: {
  "message": "connector not reachable at https://<my_url>/search. error: request failed with status code 401"
}
  statusCode: 400,
  body: {
    message: 'connector not reachable at https://<my_url>/search. error: request failed with status code 401'
  }
}

Any ideas?

I'm following this documentation:

https://docs.cohere.com/reference/create-connector

Thanks

Additional information

No response

[Confluence] More flexible search

Which connector is affected?

Confluence

What would you like to see improved?

Currently Confluence API is doing an AND query meaning every word must appear in a document. This is bad because of the nature of the conversations the question will not show up in the documents.

Confluence API have some options with search:

https://confluence.atlassian.com/doc/confluence-search-syntax-158720.html

We could at least document about this options.

What I'm doing as a workaround is manually entering ORs in the queries.

CleanShot 2024-02-05 at 19 40 19

vs

CleanShot 2024-02-05 at 19 40 51

Not ideal but better than nothing.

What's your thought on this? Maybe training some intent classifier to do more sophisticated queries?

Like:

Detected entities: Emily , Project
(What's Emily doing on this project?) OR (Emily Project)

Appreciate your thoughs @tianjing-li

Additional information

No response

Update template connector

Which connector is affected?

Flask template

What would you like to see improved?

Some of the logic and docs are outdated in the Flask connector, notably the lack of a provider.py or client.py. The README could also be improved

Additional information

No response

Improve CONNECTOR_API_KEY documentation in all READMEs

Which connector is affected?

All connectors

What would you like to see improved?

Currently it can be confusing or unclear what the CONNECTOR_API_KEY environment variable is used for.

We need to clarify:

  • That the user will need to create and manage their own keys to secure the connector
  • That the CONNECTOR_API_KEY is required for all connectors
  • That this key will be used to authenticate requests to the connector when calling the /search API

Additional information

No response

wrong credentials data loader in mongodb connector

Which connector is affected?

MongoDB

What is the issue?

dev/load_data.py has:

client = pymongo.MongoClient(
host=os.environ.get("MONGODB_HOST", "mongo"),
port=os.environ.get("MONGODB_PORT", 27017),
username=os.environ.get("MONGODB_ROOT_USERNAME", "root"),
password=os.environ.get("MONGODB_ROOT_PASSWORD", "example"),
)

instead it should be:
client = pymongo.MongoClient(
connection_string,
)

Additional information

No response

[Confluence] KeyError: 'content'

Which connector is affected?

confluence

What is the issue?

For some reason some non-page objects make it through the code and they don't have the content key so the connector crashes in that cases.

{'space': {'key': 'CFS', 'name': 'TEST', 'type': 'global', 'metadata': {}, 'status': 'current', '_expandable': {'operations': '', 'permissions': '', 'description': ''}, '_links': {'self': 'https://xxxx-xxx.atlassian.net/wiki/rest/api/space/CFS'}}, 'title': '@@@hl@@@TEST@@@endhl@@@ Financial Solutions', 'excerpt': '', 'url': '/spaces/CFS', 'resultGlobalContainer': {'title': 'TEST Financial Solutions', 'displayUrl': '/spaces/CFS'}, 'breadcrumbs': [], 'entityType': 'space', 'iconCssClass': 'aui-icon content-type-space', 'lastModified': '2024-02-05T14:04:51.000Z', 'friendlyLastModified': 'about 2 hours ago', 'score': 0.0}

I fixed it this way:

    async def _gather(self, pages, results):
        tasks = []
        for page in pages:
            # Added check for content to avoid errors
            if "content" not in page:
                continue
            # end of added check
            page_id = page["content"]["id"]
            tasks.append(self._fetch_page(page_id, results))
        return await asyncio.gather(*tasks)

But I'm not sure about how this will impact the connector:

Thanks

Additional information

No response

[Confluence] Chunked HTML

Which connector is affected?

Confluence

What is the issue?

The content is not cleaned before chunking which makes very hard to clean up in the UI.

This is specially important for citations.

image

There are pages with giant style sheets making it to the citations.

I understand HTML can be helpful for context and formatting, but maybe including a stripped version field would help?

Sorry if this is a expected behavior.

Additional information

No response

[Confluence] Use V2 API

Which connector is affected?

Confluence

What would you like to see improved?

Instead of the Python SDK (which is being migrated to V2 slowly) just use requests to call their v2 API directly for search and get page

Additional information

No response

[DOCS] Asana connector misleading docs.

Which connector is affected?

asana

What would you like to see improved?

Documentation

Additional information

The documentation is not clear. I'm following the steps and I'm seeing this error:

{
  "detail": "No authorization token provided",
  "status": 401,
  "title": "Unauthorized",
  "type": "about:blank"
}

I set my env vars:

ASANA_AUTH_TYPE=access_token
ASANA_ACCESS_TOKEN=2/12065118.....
ASANA_WORKSPACE_GID=12065.....
ASANA_CONNECTOR_API_KEY=eyJhbGc...

I found the API call do not include the bearer token :

curl --location 'http://localhost:5000/search' \
--header 'Content-Type: application/json' \
--data '{
    "query": "BBQ"
  }'

After manually putting the header I see an error with the workspace ID because going to admin.asana.com shows a wrong one.

I went to this URL to get the ID:
https://app.asana.com/api/1.0/workspaces

Final request:

curl --location 'http://localhost:5000/search' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer eyJhbGc........' \ <---- Missing from docs
--data '{
    "query": "BBQ"
  }'

I would suggest putting together all the env vars needed for each authentication method. Finding that workspace id was required was not straightforward.

After getting it right all worked nicely. Thanks for this efforts ! I would be glad to rework the docs and create a PR if you like.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.