Code Monkey home page Code Monkey logo

lingoose's Introduction

πŸ‘¨β€πŸ’» Senior Backend Developer at UFirst | ☁️ Cloud Adept | 🐧Linux/IoT Expert | 🏝️ Full-remote Addicted | ✍️ Content Creator

Blog - simonevellei.com simonevellei simonevellei

Henomis's github stats

trophy

lingoose's People

Contributors

flyingduck avatar henomis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lingoose's Issues

v0.0.13

  • #150
  • add drop and delete index methods #153
  • switch to openai tools and update models #156
  • Add HF text to image models support #157
  • implement lingoose thread using openai as LLM backend #158

Implement method for Vector Databases to delete collection/payloads

Is your feature request related to a problem? Please describe.
Sometimes you do want to remove a collection/payload from a vector database, these methods are exposed by vector/databases APIs.

Describe the solution you'd like
Add a method to the index interface to allow for deletion of documents

release v0.0.3

Tasks

  • use new documentation template
  • add linter to github actions #26
  • Refactor pipeline input and ouput #27
  • Add splitter #28
  • move docs to a different branch #29
  • refactor prompt interface #30

v0.0.8

tasks:

  • support Qdrant #89
  • implement dalle transformer #84
  • support custom openai client (discussion here) #86
  • support new openai models #88
  • support for openai functions see here #90

Fun fact: started integrating qdrant official client that uses GRPC. To avoid huge imports a custom qdrant-go package has beed develope, based on my (and already imported in lingoose) restclientgo

Build issue with index package (using Go 1.21)

Hi! Thank you for this library, very excited to give it a try. When trying to build your quickstart, I run into these issues on go build (Go 1.21).

./main.go:18:8: undefined: index.NewSimpleVectorIndex
./main.go:19:27: undefined: index.NewSimpleVectorIndex
./main.go:19:127: undefined: index.WithTopK

All other packages seem to be correctly installed. Only "index" is preventing the build. Any idea what could be wrong here?

Where do I put my keys?

I understand that openAI does not allow for shared API keys, but I have been unable to find in any example where to put an API key. For example, from the quickstart on the docs I see no instructions on where to put my key, despite the example saying that this is literally it. I also haven't been able to locate anything at all regarding openAI API keys in documentation.

qapipeline.WithPrompt returns a new QAPipeline instance

Describe the bug
When using qapipeline.WithPrompt, it returns a new QAPipeline instead of modifying the current one, later when calling Run or Query it throws a nil pointer exception because the LLMEngine is nil.

To Reproduce

res, err := qapipeline.
	New(openaiClient).
	WithPrompt(chatConv).
	WithIndex(a.index).
	Query(context.Background(), query, option.WithTopK(1))

Expected behavior
The WithPrompt method should modify the current instance and return that one.

Lingoose version: 0.0.12

v0.0.7

Tasks

  • support openai completion and chat streams #73
  • support hugging face conversational #74
  • support hugging face text generation model ref #75
  • add tesseract loader #76
  • add hugging face image to text loader #77
  • add hugging face speech recognition loader #78
  • add hugging face sentence transformer embedder #79
  • summarize #80

release v0.0.4

Tasks

  • 🐞 fix metadata deep copy #32
  • Don't store documents content as metadata, return ID. #33
  • include more unit test #34
  • check all New constructors and if they contain initialization of default values, consider to unexport the struct. (easy if all the struct properties are unexported) #35
  • refactor vector upsert & index creation, Create a batch upsert (100?) and check if index exists already #36
    • batchSize = 25 default, index shouldn't be created by lingoose.
    • Document content can be inserted (as done in v0.0.3) into vectors as metadata ref: here
  • provide package errors #37
  • lmm metadata callback #38
  • add github star button
  • refactor documentation following the changes of this PR and all related to this issue
  • use String() in prompt #39
  • Add types.Meta for metadata #40
  • add pdf loader #41
  • refactor float64 embeddings #42
  • Support whisper audio output format #43
  • change embedder interface #44
  • use this to chunk and normalize openai embeddings #45
  • refactor loader and add PubMed #46

Why go?

Hi henomis,
Thanks for making this cool project.
I am hesitating which framework to use, a go version or a python version.
Is the only reason why we choose a golang framework because it's more friendly to gophers?
Are there any other strong reason for choosing a golang framework?
Any insight would be much appreciated!

SimpleVectorIndex.load() is easy to be called more than once

index/simpleVectorIndex

func (s *Index) IsEmpty() (bool, error) {
	...      
	err := s.load()
	...
}

func (s *Index) SimilaritySearch(ctx context.Context, query string, opts ...option.Option) (index.SearchResponses, error) {
	...
	err := s.load()
	...
}

The invocation of the load() function is very subtle, which easily leads to cases of repeated calls. Such as in the example.

We'd better make sure it's loaded only once.

v0.0.11

  • fix README example #117
  • change logo and refactor README #118
  • custom HTTP client support for Huggingface API #122
  • fix multiple load() in simpleVectorIndex #123
  • fix huggingface llm verbosity #124
  • fix directoryLoader validator #119
  • Refactor simpleVectorIndex internal structure: see here #126
  • indexes must be able to work with raw data structure not linked to documents. #127
  • refactor indexes methods Search and Query. #128
  • add a method to append a vector to an index #129
  • implement index retriever #130
  • implement cache #131
  • retriever implementation has the following issues: #132
    • is strictly linked to documents. A retriever is an helper to access to a an index.
    • who is in charge to load documents? index or retriever?
    • remove retriever
  • lint code #133
  • Add new QA pipeline mode refine #134
  • Fix: indexes may confuse cosine distance with cosine similarity #135
  • Update docs

release 0.0.6

  • add context to loaders #59
  • audio loader whispercpp #60
  • audio loader whisper api #62
  • csv loader #65
  • libreoffice loader #61
  • llamacpp llm #63
  • llamacpp embeddings #64
  • add pipeline step callback. #67 #68
  • add sql tube #66
  • add support for mysql #69

Inquiry about support for custom HTTP Client of Hugging Face API

Background:

Currently, in the doRequest function of the HuggingFace, HTTP requests are made using the http.DefaultClient. While this works for most scenarios, there is a need for more flexibility when it comes to customizing the behavior of the HTTP client.

Request:

I would like to request an enhancement that allows users to specify their own HTTP client when making requests through the library. This feature would provide users with the ability to configure custom settings for the HTTP client, such as timeouts, custom transport options, or any other client-specific configurations.

Proposed Implementation:

One possible implementation approach could involve modifying the doRequest function to accept an http.Client as an argument. This change would allow users to pass their own pre-configured HTTP client when making requests, as follows:

func (h *HuggingFace) doRequest(ctx context.Context, jsonBody []byte, model string, httpClient *http.Client) ([]byte, error) {
    // Use the provided httpClient for making the request.
    // ...
}

By making this modification, users would have the flexibility to tailor the HTTP client to their specific requirements.

release v0.0.5

Tasks

  • add stop sequence to openai lmm constructor #48
  • Constructors shouldn't return error otherwise they will not composable a := pkg.New().WithValue().WithSome(). #49
  • t := NewTube(llm,decoder).WithMemory(name,memory) in general extend with methods to get optional parameters. #50
  • pinecone add option to create index. consider constructor composability. #51
  • add concurrent embedding using goroutines #53
  • A generic loader should has optional text splitter. compose a loader with WithSplitter(textsplitter). textsplitter must be an interface inside loaders. The result should be something like: loader.NewPDFToTextLoader("/usr/bin/pdftotext", "./kb").WithSplitter(textsplitter.NewRecursiveCharacterTextSplitter(2000, 200)).Load() #54
  • are pipe templates useful? https://twitter.com/matchaman11/status/1655622928535523328?s=46&t=StJvFDFYoKhmJGuu1e_cbA #52
  • refactor prompt templates #55
  • refactor splitter #57
  • refactor readme #58

v0.0.12

  • implement index engines #138
  • implement Milvus as engine #139
  • add index insert data callback #144
  • implement redis vector storage (see here and here) #145
  • implement PostgreSQL engine #146
  • Misc #147

Knowledge Base example can't be run

Describe the bug
When trying to run the Knowledge base example (https://github.com/henomis/lingoose/blob/main/examples/embeddings/knowledge_base/main.go) I got an error about the github.com/henomis/lingoose/index/vectordb/jsondb package.

% go mod tidy
go: finding module for package github.com/henomis/lingoose/index/vectordb/jsondb
go: example.com/lingoosedb imports
	github.com/henomis/lingoose/index/vectordb/jsondb: module github.com/henomis/lingoose@latest found (v0.0.11), but does not contain package github.com/henomis/lingoose/index/vectordb/jsondb

Release v0.0.1-alpha2

Tasks

  • implement github pages lingoose homepage
  • add to the README the concept lingoose = lingo + go + goose
  • request a ⭐

Refactor API

tasks

  • remove examples
  • remove partials
  • remove langchain?
  • New(input interface{}, outputDecoderFn DecoderFn, template string)
  • Add new code
type Decoder interface {
	Decode(interface{}) error
}

type OutputHandler func(string) Decoder

type Template struct {
	Input         interface{}
	Output         interface{}
	OutputHandler OutputHandler
	Template string

	templateEngine *template.Template
}


func New(
	input interface{},
	output interface{},
	outputHandler OutputHandler,
	template string,
) (*Template, error) {
         // validate input struct using go struct validator
         // validate template
	templateEngine, err := texttemplate.New("prompt").Parse(template)
	if err != nil {
		return nil, err
	}

	return &Template{
		Input:          input,
		Output:       output,
		OutputHandler:  outputHandler,
		Template:       template,
		templateEngine: templateEngine,
	}, nil
}

func (p *Template) Format() (string, error) {

	var output bytes.Buffer
	err := p.templateEngine.Execute(&output, p.Input)
	if err != nil {
		return "", err
	}

	return output.String(), nil
}

type Llm struct {}

func (l *Llm) Completion(promptTemplate *Template) (interface{}, error) {
	// prompt

	prompt, err := promptTemplate.Format()
	_ = prompt

	var output string
	_ = output // call llm(prompt) -> output

	var llmResponse interface{}
	_ = llmResponse // llm response

	// decode output
	err = promptTemplate.OutputHandler(output).Decode(promptTemplate.Output)
	if err != nil {
		return nil, err
	}

	return llmResponse, err
}

func (l *Llm) Chat(chat *chat.Chat) interface{} {
	// chat prompt

	messages := chat.ToMessages()
	_= messages

	// call llm(messages) -> output
	// add message to chat messages?

	return nil
}

type Pipeline struct{}

func (p *Pipeline) Run(llm *Llm, prompt *Template) (interface{}, error) {
	llm.Completion(prompt)

	return prompt.Output, nil
}

Long term plan

Long term plan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.