Regex based field search on database fields ✨
Explore the Docs »
Datahub currently doesn't have a feature to allow one to search for databases based on fields. We propose a regex based field search powered by elasticsearch (this can be also done using lucene) on fields for better management and tracking of it.
Here's why:
- Allow old data to be reused for new applications - Since one would first search for relevant fields before recreating another dataset
- Save Money - As datasets will be reused we'll reduce the cost significantly
- Improve performance - As the number of datasets will decrease in the system it would allow datastores to perform better
- Tracking sensitive and legal data - One would be able to search for a sensitive field across all datasets and figure out all at once
Example:
Let's say we're looking for *ip_*. Then it will return any tables (USER_DATA, RANDOM_DATA) that we store and match the pattern.
This project uses elasticsearch and the api interface can be created in python, the project gives a proof of concept on how the api can be setup where the field_regex can be passed in param. Tools / technologies used:
To get up and running with this project on your local machine follow these simple steps.
Here's a list of things you'll need to use have prior to installing the software.
- elasticsearch
# Refer: https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html
- Clone the repo
git clone https://github.com/avisionx/metadata-day-2022.git
- Create virtualenv & activate it
python3 -m venv venv
source venv/bin/activate
- Install dependencies
pip install -r requirements.txt
- Run python script to run example searches
python main.py
The tool can be extended...
- point 1
- point 2
Distributed under the MIT License. See LICENSE
for more information.
Avi Garg - https://avisionx.net/ - [email protected] Debashish Ghosh - Rishika Gupta -
Project Link: https://github.com/avisionx/metadata-day-2022