Code Monkey home page Code Monkey logo

metadata-day-2022's Introduction

Metadata Day 2022

Regex based field search on database fields ✨
Explore the Docs »

Table of Contents

About The Project

Datahub currently doesn't have a feature to allow one to search for databases based on fields. We propose a regex based field search powered by elasticsearch (this can be also done using lucene) on fields for better management and tracking of it.

Here's why:

  • Allow old data to be reused for new applications - Since one would first search for relevant fields before recreating another dataset
  • Save Money - As datasets will be reused we'll reduce the cost significantly
  • Improve performance - As the number of datasets will decrease in the system it would allow datastores to perform better
  • Tracking sensitive and legal data - One would be able to search for a sensitive field across all datasets and figure out all at once

Example:

Let's say we're looking for *ip_*. Then it will return any tables (USER_DATA, RANDOM_DATA) that we store and match the pattern.

Built With

This project uses elasticsearch and the api interface can be created in python, the project gives a proof of concept on how the api can be setup where the field_regex can be passed in param. Tools / technologies used:

Getting Started

To get up and running with this project on your local machine follow these simple steps.

Prerequisites

Here's a list of things you'll need to use have prior to installing the software.

  • elasticsearch
# Refer: https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html

Installation

  1. Clone the repo
git clone https://github.com/avisionx/metadata-day-2022.git
  1. Create virtualenv & activate it
python3 -m venv venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Run python script to run example searches
python main.py

Roadmap

The tool can be extended...

  • point 1
  • point 2

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Avi Garg - https://avisionx.net/ - [email protected] Debashish Ghosh - Rishika Gupta -

Project Link: https://github.com/avisionx/metadata-day-2022

metadata-day-2022's People

Contributors

avisionx avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.