Code Monkey home page Code Monkey logo

ametnes / nesis Goto Github PK

View Code? Open in Web Editor NEW
40.0 3.0 21.0 37.95 MB

Your AI Powered Enterprise Knowledge Partner. Designed to be used at scale from ingesting large amounts of documents formats such as pdfs, docx, xlsx, png, jpgs, tiff, mp3, mp4, jpeg. Integrates with s3, Windows Shares, Google Drive and more.

Home Page: https://ametnes.github.io/nesis

License: Apache License 2.0

Dockerfile 0.44% Python 68.57% Mako 0.08% Shell 0.08% JavaScript 30.27% HTML 0.36% CSS 0.18% Makefile 0.01%
aig embeddings enterprise-software genai-chatbot genai-usecase rag gpt llm machine-learning ml semantic-search

nesis's Introduction

Nesis

Test Frontend Test Frontend Test Frontend

License


πŸ‘‹ What is Nesis❓

Nesis is an open-source enterprise knowledge discovery solution that connects to multitudes of datasources, collecting information and making it available in a conversation manner. Nesis leverages generative AI to aggregate document chunks collected from different documents in multiple formats such as pdf, docx, xlsx and turn them into meaning human-readable compositions. Allowing you to;

  1. Converse with your document via a simple chat interface.
  2. Conveniently view comparisons between documents.
  3. Summarise large documents.

Demo

Introduction.to.Nesis.mp4

πŸ“œ Documentation

Read the Nesis documentation here

🎰 Main features

Nesis is built to handle large amounts of data at scale. Enabling connectivity to multitudes of datasources, Nesis is able to transform data from various formats into vector embeddings to be used by your LLM of choice.

Enterprise ready knowledge discovery solution that empowers users to

  1. πŸ—£ Interact with vast document repositories in a conversational AI style.
  2. πŸ›‚ Role based access control access to the document repositories, ensuring that the enterprise user only views information they are allowed to.
  3. πŸ—„ Connect to vast number of repositories. Currently, S3, WindowsNT Shares (for your on-prem Windows environment), MinIO, Sharepoint
  4. ☁ 🏒 Can be deployed in your cloud or on-premises.
  5. πŸ” User session management.

Getting started

To get started with Nesis,

Deploy with Docker Compose

  1. Obtain your OPENAI_API_KEY from https://platform.openai.com/api-keys:

    • and update the compose.yml file entry.
    • If you do not have an OPEN_API_KEY, add environment variable NESIS_RAG_LLM_MODE=mock in the nesis_rag service in the docker compose
  2. Obtain your YOUR-HUGGINGFACE-TOKEN from https://huggingface.co:

    • and update the compose.yml file entry.
  3. Start all services locally with the provided docker compose file.

    docker-compose -f compose.yml up
    
  4. Then connect to your instance via http://localhost:58000 with the following login credentials:

  5. Connect to your minio instance via http://localhost:59001/ with the following login credentials:

    • username = your_username
    • password = your_password
  6. Upload some documents into your minio documents bucket.

  7. Back on your Nesis page, register the minio datasource with

    1. Navigate to Settings -> Datasource -> New

    2. Enter the details;

      1. Type: MinIO
      2. Name: documents
      3. Host: http://minio:9000/
      4. Access Key: your_username
      5. Access Secret: your_password
      6. Buckets: documents
      7. Click Create
      8. Then, run an adhoc ingestion by clicking the Ingest button of the datasource.

Deploy with Kubernetes

To deploy Nesis into your kubernetes cluster, see Helm Instructions.

What does Nesis mean?

Nesis is derived from the greek noun gnosis which means knowledge.

Feedback and Feature Request

πŸ’‘If you'd like to see a specific feature implemented, feel free to open up a feature request ticket. If enough users support to have the feature, we will be sure to include it in our roadmap.

🐞If you find any functionality not working as expected, please feel free to open a bug report.

⭐ Stars let us know you visited ⭐

Please give us a ⭐ to let us know you visited this page. You are already awesome.

Origins

This project has been inspired by other open-source projects. Here is a list of some of them;

nesis's People

Contributors

akizito avatar bloodykheeng avatar mawandm avatar namwanza avatar nelson-github avatar zaylon10 avatar zindazed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nesis's Issues

[Docs] Modify README.md content.

  • Introduce a link to openAI API so that the user can easily create an API key.
  • Separate the credentials for the instances.
  • Any other ...

[Feature] Add ability to schedule a datasource ingestion job

Currently all ingestion jobs run on the same schedule. This isn't practical as some document repositories might be expensive to access and or are not updated frequently and it doesn't scale.

This feature introduces the ability to schedule an ingestion job at a regular interval. The schedule should be expressible in a unix cron format.

Acceptance Criteria

  1. An endpoint for scheduling. This will be /v1/tasks. It should accept POST, PUT, GET, DELETE
  2. Introduce RBAC policy to allow for creating tasks. The RBAC resource id is in the format tasks/*

[Feature] minio connection validator

Description
We need a validator for minio connection info.

Context
In order to ensure that the connection info provided contains all the necessary attributes, we need a specific minio connection validator. This validator should check for endpoint, password, username at the least.

Update helm chart

A user should be able to deploy nesis using the provided helm chart.

  • Document values file
  • Add helm chat unit tests
  • Add GH action for helm chat unit tests

[Feature] Extract text from documents

Description
As a user, I'd like to extract text from the document.

Detail
Text extraction is useful to allow for intermediary steps to document ingestion. This will be allow for other processes such as;

  1. Data cleansing
  2. Data exclusion based on an exclusion list.
  3. Approval workflows

Acceptance Criteria

  1. An API /v1/extractions/text in the RAG microservice.
  2. Extraction path added to the API microservice during document processing.
  3. Persisting the extracted text to an external SQL datasource.

[BUG] User session never timesout

Nesis version

0.1.0

Describe the bug

A logged in user's session never times out.

To reproduce

No response

Expected behavior

The logged in user session should timeout after a set (customizable) value perhaps in seconds.

Screenshots

No response

Additional context

No response

[Feature] Manage roles/policy frontend form

Description
Create a UI form for managing roles and policies.

Detail
The current method of adding roles and policies is too technical. In order to make it easier for create and manage policies, we need a form to manage roles and policies.

Acceptance Criteria

  1. A form to add/edit a role's policies
  2. The user should only need to select and click policies to add to the role.

[Feature] Add Sharepoint datasoure

Describe the solution you'd like
This feature adds Sharepoint as a datasource.

Additional context
This particularly useful for Microsoft shops

[Build] Migrate helm chart to helm repository

Description
In order to make it easier for developers to deploy Nesis, we need a helm chart that is reachable outside this GH repository from a helm chart repository.

Detail
Having a helm chart in a dedicated repository makes it possible to integrate the deployment of Nesis in an automated fashion such CI/CD process.

Acceptance Criteria

  • Delete the ./helm folder
  • Delete the ./.github/workflows/test_helm.yml workflow file
  • A helm chart in the https://ametnes.github.io/helm repository deployable with
     helm repo add ametnes https://ametnes.github.io/helm && helm repo update
     helm upgrade --install nesis ametnes/nesis
    

[BUG] Samba document ids are always unique

Nesis version

0.1.1

Describe the bug

Samba generates unique ids for the same document. Document ids need to be the same (reproducable) for the same document full path.

To reproduce

No response

Expected behavior

No response

Screenshots

No response

Additional context

No response

[Feature] Ability to search for images and videos.

I should be able to search for videos or images, e.g. by the title of the image or video, by geo-coordinates, also with the context of the video and so on.

When I use nesis to analyse videos, I want the following types of information to be extracted or analysed, depending on my goals or application. Here are some common types of information I want nesis to output from the videos and images:

  1. Video specification
  • Object Detection and Tracking: Identify and track objects or people within the video frame over time. This could include detecting vehicles, pedestrians, animals, or specific objects of interest.

  • Action Recognition: Recognize human actions or activities depicted in the video, such as walking, running, sitting, or gestures.

  • Facial Recognition: Identify and recognize faces of individuals appearing in the video, potentially matching them against a database of known individuals.

  • Emotion Recognition: Analyze facial expressions to infer the emotional state of individuals within the video, such as happiness, sadness, anger, or surprise.

  • Speech Recognition and Transcription: Convert spoken words within the video into text, enabling transcription and analysis of dialogue or speech content.

  • Scene Understanding: Understand the overall context or scene depicted in the video, such as indoor or outdoor settings, specific locations, or environmental conditions.

  • Object Localization: Determine the spatial location of objects within the video frame, potentially enabling tasks like object counting or density estimation.

  • Audio Analysis: Analyze audio content within the video, such as identifying background sounds, music, or speech patterns.

  • Event Detection: Detect specific events or occurrences within the video, such as accidents, crowds, celebrations, or anomalies.

  • Sentiment Analysis: Analyze the overall sentiment or mood conveyed by the video content, based on visual and audio cues.

  • Content Summarization: Automatically generate summaries or highlights of the video content, highlighting key moments or segments.

  • Video Enhancement: Enhance the quality of the video by adjusting parameters like brightness, contrast, or stabilization.

  1. Image Specification
  • Object Detection and Recognition: Identify and label objects present in the image, such as cars, people, animals, or household items.

  • Image Classification: Categorize images into predefined classes or categories, such as identifying whether an image contains a cat or a dog.

  • Semantic Segmentation: Segment the image into different regions and assign a label to each region based on its semantic meaning. For example, separating foreground objects from the background.

  • Text Detection and Recognition: Identify and extract text from images, which can be useful for tasks like reading license plates, recognizing handwritten notes, or extracting information from documents.

  • Facial Recognition: Identify and recognize faces in the image, potentially matching them against a database of known individuals.

  • Scene Understanding: Understand the overall context or scene depicted in the image, such as whether it's indoor or outdoor, daytime or nighttime, and the general activities or events taking place.

  • Image Quality Assessment: Assess the quality of the image, including factors like resolution, brightness, contrast, and sharpness.

  • Image Enhancement: Automatically enhance the quality of the image by adjusting parameters like brightness, contrast, and color balance.

  • Image Similarity and Search: Compare the image with a database of other images to find similar or visually related images.

  • Metadata Extraction: Extract metadata embedded in the image file, such as location information, camera settings, and timestamps.

  • Anomaly Detection: Identify unusual or abnormal patterns within the image that may indicate potential problems or anomalies.

[Feature] Integrate Apps with Nesis

Description
Allow other apps to integrate with Nesis.

Detail
This feature enables Nesis to be used within the enterprise by integrating with existing enterprise applications. This allows application developers to build applications that leverage generative AI applying it to internal enterprise applications and take advantage of Nesis' data ingestion, security and privacy model.

Acceptance Criteria

  1. An app can be registered and given a unique applications token. This is used to authenticate the app on every API request.
  2. An app can have a role attached to it. When this is the case, all API requests permissions are validated for the actions they are attempting to perform.
  3. When an API request is made by the app and an X-Nesis-Request-UserKey is present in the header, then the request's action is validated against the roles attached to the user (whose user_id is the value of the header key X-Nesis-Request-UserKey).
  4. An app token can have an expiry.

[Feature] End to end encryption of communication

Description
Communication between microservices can be encrypted for secure communication.

Detail
Currently communication between microservices is not encrypted. In a highly secure environment, encrypted communication is a requirement. This issue introduces encryption between the Frontend, API and Rag engine.

Acceptance Criteria

  1. Encryption is optional.
  2. When encryption is enabled, all communication is encrypted by mTLS.
  3. Certificates can be self-signed with a common CA and different Certs for the Frontend, API and RAG

[BUG] Updating datasource connection overrides fields

Nesis version

0.0.2

Describe the bug

When a datasource object is updated, the connection details override existing ones which causes invalid connection attributes such as password to get overriden with invalid ones.

To reproduce

  1. Add a datasource in the UI
  2. Edit the datasource and update only the endpoint while living the username and password blank
  3. Run an ingestion
  4. Ingestion will fail with invalid credentials

Expected behavior

It is expected that only supplied connection object attributes should be updated.

Screenshots

No response

Additional context

No response

[Feature] Use decorators for authentication and authorization

Description
Current implementation of auth is very verbose. We need to make it easier to code with by using function decorator to encapsulate all auths for each function.

Detail
Add any other context or screenshots about the feature request.

Acceptance Criteria

  1. Migrate authorized and authorized_resources functions into decorators.
  2. Replace all instances of authorized and authorized_resources with the new decorators.

[Package] package rag engine with pytorch@cpu

Describe the solution you'd like
In order to make the RAG engine docker image small enough, we should package it with only pytorch cpu support. This will reduce the image size significantly (without nvidia libraries).

Additional context
The RAG engine uses Huggingface Embeddings library which requires Pytorch. This change means that that when running Huggingface embeddings in local mode, it may take longer to generate embeddings.

[Feature] Authenticate with Microsoft

Description
In order to enable more secure user management, this feature introduces authentication with Microsoft.

Acceptance Criteria

  • A button on the login screen with Sign in with Microsoft
  • When the user signs in, they are auto registered.

[Feature] Use fastapi for both the API and RAG Services

Description
Currently the API service uses flask while the RAG engine uses FastAPI. We should use the same framework across the board.

Acceptance Criteria

  1. All REST endpoints for the API service should be served by FastAPI.
  2. Remove all trace of Flask.

[Feature] Dynamically exclude tokens from RAG output

Description
We want the ability to exclude tokens based on an exclusion list combined with RBAC.

Detail
When handling some kind of information, some tokens need to be excluded. Two scenarios where this is necessary include;

  1. Handling sensitive PII information. In this scenario it may be permitted for one user to view the sensitive information such as a customer service representative while another user may not be permitted to view the same sensitive information. In order to achieve this, an exclusion list must be created and associated to the roles of the respective user.
  2. Filtering offensive language. This scenario can be useful when we want to exclude some perceived offensive tokens for example when there age restrictions.

Acceptance Criteria
This is an exploratory task so this list may be changed.

  1. Add an exclusion list data source. This can include;
    1. SQL database
    2. REST API
    3. File upload
  2. An exclusion list can be associated to a role.
  3. Exclusion rules are applied to user dynamically.

[Feature] Authenticate with Google

Description
In order to enable more secure user management, this feature introduces authentication with Google

Acceptance Criteria

  • A button on the login screen with Sign in with Google
  • When the user signs in, they are auto registered.

[Feature] Persist policy document as received

Description
We want to persist the role policy document to preserve format inputted.

Detail
This feature is just for purposes of consistency in order for the user to view the policy in exactly the same format they input the policy in.

Acceptance Criteria
Add policy field to the role entity.

[Build] Split rag tests for cpu and cuda

Description
We want to run unit tests for cpu separate from those for cuda

Additional context
This will allow us to ensure that filereaders work in both platform scenarios.

[Feature] Show chat history

Description
We want to show chat history for a given user/session.

Detail
This will allow users to retrieve previous search results.

[BUG] User must not be able to disable themselves

Nesis version

0.1.0

Describe the bug

It is currently possible for a user to disable themselves. This effectively locks the user out of Nesis.

To reproduce

No response

Expected behavior

When a user disables themselves, they should be prompted with a message.

It is not possible to disable this user.

Screenshots

No response

Additional context

No response

[Feature] Dynamic datasource connection form

Description
For clarity, we need to have datasource type specific connection detail input forms.

Detail
This will improve setting up datasources by making it clear which fields are necessary for each datasource type.

Acceptance Criteria
At creating a new datasource, when a user selects a datasource type, the user should only be presented input fields specific to that datasource.

[Feature] Add version env variable to all docker images

Description
A version env variable helps us to know exactly what version we are running.

Detail
This can be useful in displaying on the frontend.

Acceptance Criteria

  1. A NESIS_VERSION environment variable should be available to all containers.
  2. A GIT_COMMIT_HASH environment variable should be available to know which commit each container was built from.

[BUG] Sharepoint ingestion fails with remote end closed connection without response

Nesis version

0.1.0

Describe the bug

During a long running Sharepoint ingestion process, an error

[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis] 2024-05-04 01:44:36.695 [WARNING ] nesis.api.core.document_loaders.sharepoint - Error when getting and ingesting file Stock Market Wizards (Jack D. Schwager) (z-lib.org).pdf - ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Generating embeddings:   0%|          | 0/14 [00:00<?, ?it/s]Killed
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis] 2024-05-04 01:44:37.469 [ERROR   ] nesis.api.core.document_loaders.sharepoint - Error fetching and updating documents - Error: (None, None, "401 Client Error: Unauthorized for url: https://site.sharepoint.com/sites/nesis-test/_api/Web/GetFolderById('d5bc341a-8557-4c67-8c40-1cb0e085def9')?$select=Files&$expand=Files")
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis] Traceback (most recent call last):
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]   File "/app/.venv/lib/python3.11/site-packages/office365/runtime/client_request.py", line 38, in execute_query
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]     response.raise_for_status()
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]   File "/app/.venv/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]     raise HTTPError(http_error_msg, response=self)
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis] requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://site.sharepoint.com/sites/nesis-test/_api/Web/GetFolderById('d5bc341a-8557-4c67-8c40-1cb0e085def9')?$select=Files&$expand=Files
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis] During handling of the above exception, another exception occurred:
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis] Traceback (most recent call last):
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]   File "/app/nesis/api/core/document_loaders/sharepoint.py", line 117, in _sync_sharepoint_documents
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]     _process_folder_files(
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]   File "/app/nesis/api/core/document_loaders/sharepoint.py", line 168, in _process_folder_files
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]     _files = folder.get_files(False).execute_query()
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]   File "/app/.venv/lib/python3.11/site-packages/office365/runtime/client_object.py", line 52, in execute_query
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]     self.context.execute_query()
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]   File "/app/.venv/lib/python3.11/site-packages/office365/runtime/client_runtime_context.py", line 183, in execute_query
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]     self.pending_request().execute_query(qry)
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]   File "/app/.venv/lib/python3.11/site-packages/office365/runtime/client_request.py", line 42, in execute_query
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis]     raise ClientRequestException(*e.args, response=e.response)
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis] office365.runtime.client_request_exception.ClientRequestException: (None, None, "401 Client Error: Unauthorized for url: https://site.sharepoint.com/sites/nesis-test/_api/Web/GetFolderById('d5bc341a-8557-4c67-8c40-1cb0e085def9')?$select=Files&$expand=Files")
[resource-1584063407-nesis-api-6c6f84957f-gflsf nesis] 2024-05-04 01:44:37.503 [INFO    ] apscheduler.executors.default - Job "ingest_datasource (trigger: date[2024-05-04 00:19:28 UTC], next run at: 2024-05-04 00:19:28 UTC)" executed successfully

Shows

To reproduce

  1. Create a sharepoint datasource
  2. Add multiple large documents to the Sharepoint
  3. Run the ingestion... after a while, the API service logs show a 401 Client Error: Unauthorized for url...

Expected behavior

The ingestion should run continuously. It seems like a refresh of the Sharepoint client authentication is needed

Screenshots

No response

Additional context

No response

[API] Sync from MinIO failing

Sync from MinIO failing with

Invalid URL 'None/v1/ingest/files': No scheme supplied. Perhaps you meant https://None/v1/ingest/files

This is due to document manager referencing the wrong rag endpoint here

The correct settings block should here rag not llm. See this

[Docs] Add deployment options documentation

Description
Deployment options for Nesis need to be documented.

Additional context
Add any other context or screenshots about the feature request.
These include manually deploying using helm which can also be added to a GitOps workflow. Alternatively the Ametnes Platform can be used to deploy Nesis.

[Feature] Run ingestion tasks in parallel

Description
Introduce parallelization in ingestion tasks.

Detail
In order to improve ingestion throughput, we want to increase the number of processes running a give ingestion task.

[Feature] Manage users' rbac using user groups

Description
User groups help manage user permissions in bulk.

Detail
Managing user permissions in bulk can be a challenge. The concept of user groups allows to add user permissions in bulk with simplicity. This feature introduces user groups.

Acceptance Criteria

  1. A user group can be created.
  2. A role can be assigned to a user group.
  3. A user can be assigned to a user group.
  4. A user assigned to the user group assumes all permissions from all roles assigned to the user group.

[BUG] Minio ingestion fails with no bucket in connection details

Nesis version

0.0.2

Describe the bug

When no bucket is supplied in the minio connection details, this causes the ingestion process to fail. We should not permit the minio connection details without a bucket.

To reproduce

No response

Expected behavior

No response

Screenshots

No response

Additional context

No response

[BUG] Cannot schedule multiple task on a given task type

Nesis version

0.1.0,0.1.1

Describe the bug

Two datasources cannot be created with the same ingestion schedule

To reproduce

  1. Create a datasource ds1 and schedule */5 * * * *
  2. Creating another datasource ds2 and schedule */5 * * * * gives the error Task already scheduled on this type

Expected behavior

Multiple datasources with the same schedule should be supported.

Screenshots

No response

Additional context

The constraint uq_task_type_schedule seems to be the culprit. It should be on parent_id and schedule rather than on task type and schedule.

[BUG] Policy on wildcard resource restricts actions on a single resource

Nesis version

0.1.0

Describe the bug

When a wild card policy is present, action on a single resource is not permitted.

To reproduce

  1. Attach a role with policy to a user
{
  "name": "user-reader",
  "policy": {
    "items": [
      {
        "action": "create", 
        "resource": "users/*"
      },
      {
        "action": "read", 
        "resource": "users/*"
      },
      {
        "action": "delete", 
        "resource": "users/*"
      },
      {
        "action": "update", 
        "resource": "users/*"
      }
    ]
  }
}
  1. Login as that user
  2. Update the user
  3. Error Unauthorized action on resource

Expected behavior

A user should be able to update their details.

Screenshots

No response

Additional context

No response

[Docs] Add contribution guidelines

Description
Add a contribution guide line to guide contributors in making new contributions.

Acceptance Criteria
A CONTRIBUTORS.md file should be created.

[Packaging] Create a non-cuda rag engine docker image

The rag engine docker image installs with cuda libraries. These libraries aren't really needed when using embeddings from and OPENAI endpoint (remotely). So we should not need cuda libraries.

The cuda libraries are introduced via the huggingface embeddings python library.

This ticket explores having an option to package the rag engine without cuda libraries in order to trim down the docker image size.

[API] User with access to a single datasource can list all datasources

This can easily be replicated.

  1. Create a user.
  2. Assign a role (with access to a single datasource) to that user.
  3. Login as the user in 1.
  4. Go to Settings -> Datasources and you are able to list all datasources.

Expected behavior is to list only the datasource the user has access to.

[Feature] Display Select Datasource Message on Initial Datasource creation screen

Description
Display Select Datasource Type message when the user clicks New datasource button.

Detail
When the user clicks the new datasource button, we would like to display a message that prompts user to select a datasource type. Remove the current default form fields since they do not relate to any specific datasource

Acceptance Criteria
When user click new datasource button, form should open with prompt " Select datasource type" and no other datasource form fields should be displayed.
Respective datasource fields should display on selection of a specific datasource type.

[Feature] Document similarity identification

Description
This is a challenged currently faced by a customer. When two versions of the same document exist. We should be able to return results from only the current version.

Detail
In any organization, it is possible to have a document with multiple versions. This can be meeting minutes with versions evolving over time. The challenge however is making sure that we only consider the latest document version when answering questions about the document.

Acceptance Criteria
Multiple document versions can be accepted and differentiated with only the current version considered when answering user questions

[Feature] Add ability to manually run a datasource ingestion job

We want to be able to start a datasource ingestion job manually. This feature will include;

  1. Adding rest end points to trigger the run e.g. /v1/datasources/<id>/ingestions
  2. Add RBAC access control to ingestion resources

Acceptance Criteria

  • A rest endpoint /v1/datasources/<id>/ingestions accepting POST (create/trigger), GET (retrieve), PUT (update status e.g. STOPING the job), DELETE (deleting an ingestion... probably not a good idea).
  • RBAC Policy with resource id ingestions/*. Since ingestion is a child of a datasource, a user must have access to the datasource before they access an ingestion.
  • The datasource status must be updated of the status of the ingestion.

[Feature] Add s3 datasource

Description
This feature adds s3 as a datasource. This allows for us to source documents from an s3 bucket.

Context
Integration with s3 should be permissible via service account IAM roles and secret/key

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.