Code Monkey home page Code Monkey logo

gopher-hunting-elasticsearch's Introduction

Go-ing Gopher Hunting with Elasticsearch and Go

This repository provides an introductory example of using the Elasticsearch Go client to find documents in Elasticsearch. Specifically, it covers three types of search:

  1. Traditional keyword search.
  2. Vector search, making use of the sentence-transformers/msmarco-MiniLM-L-12-v3 model from Hugging Face to generate the embeddings.
  3. Hybrid search combining the keyword and vector approaches.

How to Run

Elasticsearch Instance Setup

The quickest way to setup your own cluster is to register for a free trial of Elastic Cloud. You'll need to perform these additional steps:

  1. Note your Cloud ID
  2. Generate an API Key
  3. Populate your instance with data in the same format as those in the Sources section below
  4. Upload your model from Hugging Face using Eland
  5. Enriching your ingested documents using an ingest pipeline

Pre-requisites

This script requires setting the essential environment variables before running the script. I recommend using something like direnv, invoked via .envrc and then adding the variables to a top-level .env file. Alternatively you can explicitly set the environment variables in your current session according to your operating system.

The following environment variables are required:

  • ELASTIC_CLOUD_ID=<MY_INSTANCE_CLOUD_ID>
  • ELASTIC_API_KEY=<MY_API_KEY>

Starting the server

Running server.go will start a net/http server on port 80 that you can use to query Elasticsearch:

cd server
go run .

Navigate to the below URLs to obtain the Gopher search results for each search type:

Slides

The slides from the Women Who Go meetup @ Elastic are available in the docs/slides folder.

Sources

The below set of rodent-focused Wikipedia pages have been extracted to Elasticsearch using the Elastic Web Crawler:

If you're new to Go and would like to build your own Web Crawler, I recommend having a stab at this exercise in the Tour of Go where you can build your own concurrent web crawler.

Resources

Check out the below resources to learn more about Elasticsearch, Keyword Search and Vector Search.

Elasticsearch

  1. Elasticsearch
  2. Elasticsearch Go Client
  3. Understanding Analysis in Elasticsearch (Analyzers) by Bo Andersen | #CodingExplained

Vector Search

  1. code.sajari.com/word2vec
  2. huggingface | pkg.go.dev
  3. What is Vector Search | Elastic

LLMs and Natural Language Processing

  1. BERT 101: State Of The Art NLP Model Explained | Hugging Face
  2. sentence-transformers/msmarco-MiniLM-L-12-v3 | Hugging Face

gopher-hunting-elasticsearch's People

Contributors

carlyrichmond avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.