Code Monkey home page Code Monkey logo

dmart's Introduction

Data Mart (D-MART)

Latest release
Last commit License Stars Issues Repo Size

DMART is a data service layer that offers a streamlined / simplified way to develop certain class of solutions with small to medium data footprint (<=300 million primary entries). DMART is not a one-solution fit-all kind of technology, but it tries to address a wide variety of needs. Specifically, DMART is not suited for systems that have large data (> 400 million primary entries) nor systems that require heavily/complex related data modeling or requiring atomic operations (transactions).

As such, DMART serves as general-purpose, structure-oriented information management system (aka Data-as-a-Service DaaS).

It represents a low-code information inventory platform (aka content registry/repository/vault) that is able to assimilate various types of data (structured, unstructured and binary). It allows you to treat your valuable data assets as commodity; where you can cleanly author, share and extend. Thus, valuable data assets can be maintained as the mastered version and act as the single source of truth.

The problem DMART attempts to solve

Valuable information (organizational and individual) is getting out of control!

  • Information is dispersed over too many systems, requiring multiple access contexts.
  • Difficult to consolidate and link for consumption, insights, reporting and dashboards
  • Locked to vendors or application-specific data-formats
  • Chaotic and hard to discover / search the data piling up over the years
  • Difficult to master, dedup, backup, archive and restore.
  • Difficult to protect and secure

Top highlights ...

  • Data-as-a-Service : Backbone data store where the data assets get declared and used across multiple applications and microservices. The data assets are declared in the logical and business representation rather than classical RDBMS (physical).
  • Standardized API : Publicly-accessible unified api layer allowing interaction with the different types of data; and simplifying the work of application developers.
  • Data longevity : Resilient and time-proof data storage, as data is stored in flat-files directly on the file system. This opens the door for easy access, inspection, validation, backup and change tracking. At any point in time, the redis index can be recreated from the flat-files.
  • User management and access control : "Batteries included" to elevate the burden from application development.
  • Microservice friendly : Leveraging JWT shared secret, additional microservices can automatically leverage the user's session with dmart. There is also a compatible FastApi skeleton git repository to facilitate the development of additional microservices.
  • Extensible via plugins : Specialized logic (plugins) can be added to react to certain types of activities and content.
  • Entry-oriented : As opposed to document-oriented NoSQL, entry-orientation revolves around consolidating the coherent information unit alongside its belongings (known as "attachments" that can involve textual and/or binary) as one entry.
  • Activities and workflows : Configurable activity (ticket) and workflow management.
  • Messaging and notifications : Ability to trigger different types of notifications and ability to store user messages.

Core concepts

  • Each coherent information unit (data asset) is declared as entry.
    • An entry includes all related business information (meta, structured, textual and binary) that can be extended / augmented with attachments.
    • Entries are organized within arbitrary category structure (folders)
    • Entries are indexed for fast search and retrieval.
    • Entries can be optionally linked by "weak" links (aka relations).
    • Changes on entries are recorded for audit and tracking.
    • Structured content: Each structured json content (payload) is associated with a pre-defined json schema stored under the schema section in the space.
    • Arbitrary attachments: An entity could have attachments (binary or otherwise)
  • Entries are stord and organized arbitrary hierarchical folder structure (aka categories) on the file-system. Facilitating folder-based routes.

DMART is a "Data-first" platform to management your valuable data/information; allowing you to transform your perception of data from liability into assets.

API layer (REST-like, JSON-API)

  • Management : Create/update/delete schema, content, scripts, triggers, users and roles
  • Discovery : Users, paths, scripts, changes/history, schema and content
  • Consumption : Content/attachments, scripts and submissions

Full OpenApi 3 compliant documentation can be found here

Architecture and technology stack

  • flat-file data persistence on standard file-system. Using folders, clear and simple json format that is backed by json-schema, text and binary (media/documents) files.
  • Python 3.11 with emphasis on
    • asyncio : maximizing scalability and leverage of server resources and enabling background jobs (post api service time).
    • type hinting and stringent linting (pyright).
  • FastAPI as the api micro-framework (based on our curated fastapi skeleton) and full leverage of Pydantic and OpenApi version 3.
  • Hypercorn (runner server)
  • Redis as the operational data store. With specific leverage of RediSearch RedisJSON modules.
  • Intensive json-based logging for easier insights.

Terminology

Term Description
space Top-level business category that facilitates grouping of relevant content. Permissions are defined within the space boundaries
subpath The path within space that leads to an entry. e.g. content/stuff/todo
entry The basic unit of coherent information.
shortname The unique identifier that differentiates an entry among its siblings (i.e. within a subpath)
meta Meta information associated with the entry such as owner, shortname, unique uuid, creation/update timestamp, tags ..etc
schema The entry under schema subpath providing schema definition that can be referenced by structured content entries or attachments
attachment Extra data associated with the entry. An attachment has its own payload
payload The actual content associated with the entry or attachment
locator A link to another entry (within the space or in another space).
.dm The hidden folder used to store meta information and attachments and their payload files
permission The listing of entitlement tuples: actions, content types and subpaths.
role The association of a set of permissions to be granted to a user

Entry composition

  • A meta-file (json) that holds meta information about the entry; such as name, description, tags, attributes ...etc.
  • Within the meta file, each entry should have a globally unique UUID and a shortname that must be unique within the parent folder and across the sibling entries.
  • A payload as a separate file (json, text or binary)
  • Change history on that entry.
  • An entry has an arbitrary number of attachments, each attachment has a meta-file and payload.
    • Alteration: Describing a change
    • Comment
    • Relationship: A pointer to another entry
    • Media: Binary payload such as images, videos ...etc

File disposition scheme

File path Description
[sub/path]/.dm/meta.folder.json The meta file of a Folder
[sub/path]/.dm/[entryshortname]/meta.[entrytype].json The meta file of a regular entry
[sub/path]/[entrypayload] The optional payload file of the entry. it may not clash with another payload file within that folder
[sub/path]/.dm/[entryshortname]/attachments.[attachementtype]/meta.[attachmentshortname].json The meta file of an attachment
[sub/path]/.dm/[entryshortname]/attachments.[attachementtype]/[attachmentpayload] The optional attachment payload file. it may not clash with meta.[xxx].json or another payload file within that folder

With this scheme, only proper entry main payload files appear to the user. All meta data and attachments data is stored in the hidden (.dm) folders.

Installation

Requirements

Requirements

  • git
  • jq
  • python == 3.11
  • pip
  • redis >= 7.2
  • RedisJSON (rejson) >= 2.6
  • RediSearch >= 2.8

Steps

git clone https://github.com/edraj/dmart.git

cd dmart 

# Make logs folder
mkdir logs

# Copy sample spaces structure
cp sample/spaces ../


cd backend

# Install python modules
pip install --user -r requirements.txt

# Optionally, fine-tune your configuration
cp config.env.sample config.env

# Start DMART microservice
./main.py


# Optionally: check admin folder for systemd scripts

Automated testing

Installing python dependencies

pip install --user -r test-requirements.txt

Running

cd backend
./curl.sh
pytest

Using the Admin UI tool

DMART has a comprehensive Admin UI that interacts with the backend entirely via the formal API. It is built with Svelte, Routify3 and SvelteStrap.

cd dmart/frontend
yarn install

# Configure the dmart server backend url in src/config.ts

# To run in Development mode
yarn dev

# To build and run in production / static file serving mode (i.e. w/o nodejs) using Caddy
yarn build
caddy run

Building tauri binary (Linux AppImage)

This allows packaging the admin tool as a desktop application.

# Regular build without inspection
yarn tauri build --bundles appimage

# To add inspection (right mouse click -> inspect)
yarn tauri build --bundles appimage --debug

Using the command line tool

DMART comes with a command line tool that can run from anywhere. It communicates with DMART over the api.

cd cli

# Create config.ini with proper access details (url, credentials ...etc)
cp config.ini.sample config.ini

# Install additional packages
pip install --user  -r requirements.txt

# Start the cli tool
./cli.py

Offline (aka airgapped) deployment

# On the "online" computer
rmdir ~/.pipi
rmdir ~/.venv
virtualenv --python=/usr/bin/python3.11 ~/.venv # or your favorate py env virtualization tool
source ~/.venv/bin/activate
mkdir ~/.pipi
# under dmart/backend
pip download -d ~/.pipi/ $(cat *requirements.txt) virtualenv pip
rsync -av ~/.pipi/ TARGET_OFFLINE_SERVER:~/.pipi

# On the "offline" target server
pip install --no-index --find-links=~/.pipi virtualenv
virtualenv ~/.venv
source ~/.venv/bin/activate
pip install --no-index --find-links=~/.pipi --upgrade pip
pip install --no-index --find-links=~/.pipi -r requirements.txt -r test-requirements.txt -r plugins-requirements.txt

Running extra python checks

cd backend
ruff check .
mypy --explicit-package-bases --warn-return-any .

# Freeze pip modules versions
pip freeze > pip.freeze

Sample usecases

DMART is a low-level general-purpose data platform. Hence it could apply to a reasonably wide variety of usecases.

The one usecase we are currently focused on building is a universal online presence platform. A tool that combines CMS, Messaging, and Collaboration in a federated fashion (borrowing from how email federates its messaging service)

Simply put, this will help small teams, individuals and interest groups to quickly launch a website (that is their own) index-able by search engines, provision users and allow all to author and interact with content (both from the website and mobile app). With the leverage of DMART all information elements are structures as entries within the specific hierarchy desired by the admin user.

Coming soon ...

Universal online presence platform sample webapp (Svelte4, Routify3, Sveltestrap)

Universal online presence platform sample mobile app (Flutter)

dmart's People

Contributors

kefahi avatar saadadel avatar saadadel539 avatar splimter avatar raniaabualnadi avatar amremaish avatar splimterxstartappz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.