Code Monkey home page Code Monkey logo

embeddinghub's Introduction

featureform

Embedding Store workflow PyPi Downloads Featureform Slack
Python supported PyPi Version featureform Website Twitter

What is Embeddinghub?

Embeddinghub is a database built for machine learning embeddings. It is built with four goals in mind.

  • Store embeddings durably and with high availability
  • Allow for approximate nearest neighbor operations
  • Enable other operations like partitioning, sub-indices, and averaging
  • Manage versioning, access control, and rollbacks painlessly


drawing



Features

  • Supported Operations: Run approximate nearest neighbor lookups, average multiple embeddings, partition tables (spaces), cache locally while training, and more.
  • Storage: Store and index billions vectors embeddings from our storage layer.
  • Versioning: Create, manage, and rollback different versions of your embeddings.
  • Access Control: Encode different business logic and user management directly into Embeddinghub.
  • Monitoring: Keep track of how embeddings are being used, latency, throughput, and feature drift over time.

What is an Embedding?

Embeddings are dense numerical representations of real-world objects and relationships, expressed as a vector. The vector space quantifies the semantic similarity between categories. Embedding vectors that are close to each other are considered similar. Sometimes, they are used directly for “Similar items to this” section in an e-commerce store. Other times, embeddings are passed to other models. In those cases, the model can share learnings across similar items rather than treating them as two completely unique categories, as is the case with one-hot encodings. For this reason, embeddings can be used to accurately represent sparse data like clickstreams, text, and e-commerce purchases as features to downstream models.

Further Reading



Getting Started

Step 1: Install Embeddinghub client

Install the Python SDK via pip

pip install embeddinghub

Step 2: Deploy Docker container ( optional )

The Embeddinghub client can be used without a server. This is useful when using embeddings in a research environment where a database server is not necessary. If that’s the case for you, skip ahead to the next step.

Otherwise, we can use this docker command to run Embeddinghub locally and to map the container's main port to our host's port.

docker run featureformcom/embeddinghub -p 7462:7462

Step 3: Initialize Python Client

If you deployed a docker container, you can initialize the python client.

import embeddinghub as eh

hub = eh.connect(eh.Config())

Otherwise, you can use a LocalConfig to store and index embeddings locally.

hub = eh.connect(eh.LocalConfig("data/"))

Step 4: Create a Space

Embeddings are written and retrieved from Spaces. When creating a Space we must also specify a version, otherwise a default version is used.

space = hub.create_space("quickstart", dims=3)

Step 5: Upload Embeddings

We will create a dictionary of three embeddings and upload them to our new quickstart space.

embeddings = {
    "apple": [1, 0, 0],
    "orange": [1, 1, 0],
    "potato": [0, 1, 0],
    "chicken": [-1, -1, 0],
}
space.multiset(embeddings)

Step 6: Get nearest neighbors

Now we can compare apples to oranges and get the nearest neighbors.

neighbors = space.nearest_neighbors(key="apple", num=2)
print(neighbors)

Contributing


Report Issues

Please help us by reporting any issues you may have while using Embeddinghub.


License

embeddinghub's People

Contributors

antony-eng avatar dependabot[bot] avatar shabbyjoon avatar simba-git avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.