Code Monkey home page Code Monkey logo

text_depot's Introduction

Text Depot in action


Text Depot is a tool to search and analyze topics of interest within a large database of text data. The Text Depot dashboard (this repo) provides a front-end to a set of indexes in ElasticSearch. To use this repository, you must provide one or more Elastic Search indexes in a particular format.

Setup

  1. Setup Elastic Search Server
  2. Create one or more index using Text Depot mappings.
  3. Clone this repo.
  4. Run cp .configs_sample .configs and fill in the relevant values.
  5. Build and run docker container:
    DOCKER_BUILDKIT=1 docker build -t text_depot_dashboard . && docker run -it -p 8080:3838 text_depot_dashboard
  1. Open the dashboard on your browser: http://localhost:8080

Elastic Search

Each data source should be stored in its own Elastic Search index. For more information on how to configure your Elastic Search server, see elasticsearch/

Notes

Our workflow contained the following components:

Overall Workflow

This repository contains the dashboard code (Blue above) for Text Depot. The green components were scheduled with cron jobs, and keep the indexes up-to-date in the ElasticSearch Database. We wrote a custom Parser for each data source, and a single Annotator class that adds the [nieghbourhoods, sentiment, embeddings] fields to each document and inserts them. The orange components were added for authentication and embeddings-based search, and are optional components.

text_depot's People

Contributors

ben-gready avatar reisner avatar robsonyeg avatar shimmyd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

leveche

text_depot's Issues

Only uses indexes that are named coe_td_*

We currently only allow indexes that follow a city of edmonton naming convention in elastic:

# dashboard_utils.R

index_to_alias_mapping <- function(es, alias_names) {
  mapping = elastic::aliases_get(es, index = alias_names)

  mapping_df = tibble::enframe(mapping, name = "index_name") %>%
    tidyr::unnest_wider(value) %>%
    filter(str_detect(index_name, "coe_td_")) # Do not allow access to anything except indexes beginning with "coe_td_"

...

This should be generalized: perhaps set in configs?

Aggregations dont respect AI search

Fullscreen_2022-09-29__1_28_PM

All parts of the app that use aggregations ignore the min_score parameter in the elasticsearch query. This is not noticable in the regular search, because we use min_score of 0, so this doesnt filter results in the query. However, when we run an AI search, we use the min_score to filter the query, but this does not filter the aggregations.

  • The issues is mentioned here, though I couldnt get this solution to work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.