Code Monkey home page Code Monkey logo

dpd-db's Introduction

Digital Pāḷi Database

Building the DB

  1. Download this repo
  2. Get tipitaka-xml with git submodule init && git submodule update commands
  3. Install nodejs
  4. Install poetry
  5. poetry install
  6. poetry run bash bash/initial_setup_run_once.sh
  7. poetry run bash bash/build_db.sh
  8. To be able to run database tests you may need to install some of these packages.

That should create an SQLite database ./dpd.db which can be accessed by DB Browser, DBeaver, through SQLAlechmy or your preferred method.

For a quick tutorial on how to access any information in the db with SQLAlchemy, see scripts/db_search_example.py.

Build a complete database locally and extract all dictionaries

⚠️ WARNING: When sandhi/sandhi_splitter.py runs with the config option deconstructor.all_texts = yes, it will take several hours to complete.

Starting with a fresh clone of the tip:

git clone --depth=1 https://github.com/digitalpalidictionary/dpd-db.git
cd dpd-db
git submodule init && git submodule update
poetry install
poetry run bash bash/build_and_make_all.sh

This creates the dpd.db SQLite database. Also it extract all dictionaries see folder exporter/share

Code Structure

There are four parts to the code:

  1. Create the database and build up the tables of derived data.
  2. Add new words, edit and update the db with a GUI.
  3. Run data integrity tests on the db.
  4. Compile all the parts and export into various dictionary formats.

About the database

  • DpdHeadwords and DpdRoots tables are the heart of the db, everything else gets derived from those.
  • They have a relationship DpdHeadwords.rt. to access any root information. For example, DpdHeadwords.rt.root_meaning
  • There are also lots of @properties in db/models.py to access useful derived information.
  • DpdHeadwords table also contains lists of inflections of every word in multiple scripts, as well as html inflection tables.
  • FamilyCompound table is html of all the compound words which contain a specific word.
  • FamilyRoot table is html of all the words with the same prefix and root.
  • FamilySet table is html of all the words which belong to the same set, e.g. names of monks.
  • FamilyWord table is html of all the words which are derived from a common word without a root.
  • InflectionTemplates table are the templates from which all the inflection tables are derived.

dpd-db's People

Contributors

bdhrs avatar devamitta avatar gambhiro avatar bergentroll avatar bksubhuti avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.