Code Monkey home page Code Monkey logo

caseconnect's Introduction

CaseConnect

This project is a semantic search engine for the NamUs missing persons database. It contains tools to scrape all the missing person cases and download all the associated images with each case. It also contains tools to embed the text and images into a vector space. The goal is to be able to search the database by text or image and get back similar cases. This could be useful for law enforcement to search for similar cases to a new case or for the public to improve the ability to search if someone is in the database.

TODO

  • write a scraper for the database
  • embed all the text
  • embed the images
  • build the front end
  • switch to embedding db from sklearn nn
  • remove extra json data to improve embedding cost and relevance
  • test search by image
  • move cali db scraper to seperate file as there are SSL issues with their db

stretch goals

  • use control net to convert sketches to images and then do image search on those semantically
  • live generations from the sketches and then doing semantic search so you can see people as you draw
  • add chat to prompt user for more details if the input is not very descriptive

cost

$0.0004 / 1K tokens 34229931 tokens $13.6919724

data

data filetype description embeddings
json_cases json raw data
case_images jpg raw data
text_embeddings json embedded json_cases text-embedding-ada-002
image_embeddings json embedded case_images ViT-bigG-14
search_text user input user input text ada-002 and ViT-bigG-14
search_image user input user input image ViT-bigG-14

caseconnect's People

Contributors

spartanhaden avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.