datastories-unipi / minidb Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 270.0 6.38 MB

An extremely minimal DB that can be used for educational purposes and rapid prototyping

License: GNU General Public License v3.0

Python 100.00%

minidb's People

Contributors

Stargazers

Watchers

Forkers

dimjim2 kontilenia aspiringdeveloper00 nickarkoulis bl-ever elenatzr amourgianosdimitris klisira tonytziorvas vagelisav mariakypraiou tsolomitieva antonisparaskevis giannhspetris cstergio geoza2000 mkanteler nancykomm anastasiamexa konstantinoszotos galiatsatouaikaterini ioannavouk christospolydorou dimitriskikidis kostastem georgeifa elelec anthonykalampogias mariosshub andreas-kostantinis gkonstantos diodimi periklispettas stergiospap ksol-b stefanosgregory iliasbrinias vasilisgianniskontilenia anampel elenkatsikadami john-maravelis petrosfatouros achilleskastanas vaspasia aretipetta pr0cod3rs panagiwta-fournari georgekolliniatis stathiskamarinopoulos wantedhool vbellos panagiotisandreatos billygrande krigol14 sokianterror parhsmaropoulos evridiki1 jasondokusis nickija danileni vaggelisvgs charoulis axilleasgr eleftheria-dimitriadou vaggv nick-tala jtouloupis cgkoutzigiannis gianniskap robertorapesis dimkoromilas nimi00 nikolas2000 stylianikal stavrostsoukalas petrostergios chrisprogrammer01 stathisp18056 gsamarascs kalotheo dimitriosstergiou nikkont stefanosbakas john-stent spirosgiannikakis dimitrisogdimolikas 0xcoto nektariakallioupi statiou ariskallergis labrosfrag gkoviou vaggelis1gav mikesiganos christinetheod evaggelia18 tsapkostas theodosissym pap-marianthi alexkokk

minidb's Issues

Performance analysis of current implementation

Analyse the current implementations of the available I/O commands and rewrite parts that are not optimised. Ease of understanding should not be sacrificed.

Server - Client - Task #9

Implement a server-client architecture that uses ports in order for the client to send queries and receive results. Also, implement a very minimal sql compiler that just supports a select query (select * from is enough) in order to show the server-client functionality.

This can be implemented using Golang (easier for this kind of thing) or Python.

Sql compiler - Task #12

Implement an SQL compiler that supports the following queries:

Select From Where
Update Where
Insert Into
Delete From Where
Create/Drop Table
Create Database
Select From Join
Create Index On

Change savedir structure and adding setup

Current savedir structure is:

current_dir/dbdata/
├── db1/
│ ├── table1.pkl
│ ├── table2.pkl
│ ├── ...
├── db2/
│ ├── table1.pkl
│ ├── table2.pkl
│ ├── ...

Changing this to a more standardized structure (moving it to /home/ for ex. and making it hidden), as well as including a setup script that creates the tree would be nice.

SQL Compiler

Add a usable SQL compiler that follows the compiler literature as much as possible. The SQL compiler needs to be easy to grasp and extend, since any newly added functionality will need to be accompanied by a new query.

ORM Integration.

Integrate MIniDB into the popular ORM tools (Object Relational Mapping tools).

Changing the project structure

Rn the the main src files (database, table) are in the same dir as others (btree and misc files). This should change to make the project more readable and lead - eventually - to PyPi.

I will read into this and maybe update the issue down the road.

Extra IO and Primary Key rethink - Task #10

Implement the following:

Group by
Select Distinct

Additionally rethink the way the primary key is currently stored and reimplement it in a more concise and effective way.

Finally, add support for multicolumn primary key.

CI/CD stuff - linter, codeconv etc.

Add the needed github actions and CI/CD tools to make contributing to the project easier. Some examples include Unit Tests, a linter etc.

GUI admin panel.

Add a GUI admin panel.

Table partitioning and Inheritance + Distributing - Task #8

Implement the following:

Table Inheritance
Table Partitioning

As well as the following:

Distribute a DB
Distribute a partitioned table

Add Docker Configuration

Add docker config (Dockerfile, Docker-compose) with a volume to store data in the host box , so it is runnable under Win10

PyPi upload

Upload to PyPi when all/most of the story issues functionality is added.

Add a linter

Add a linter to the project to ensure code quality. Preferably use Flake8 or Pylama (or any other linter).

Joins - Task #4

Implement the following types of Joins:

Outer Joins
Index-Nested-Loop-Join
Sort-Merge-Join

Push project to PyPi

The "intended" way of installing miniDB is currently through git (cloning and running an example .py file).

This poses a couple of problems, which are directly addressed if the project is pushed to PyPi:

In order to update to a newer version of miniDB, you'd have to re-clone the repository. With pip, you can simply run pip install -U mini-db*)
Installation is inconvenient (imagine having to git clone every Py package to a local directory (e.g. ~/Desktop) - it'd be a bit of a mess!). With pip, it's as simple as pip install mini-db and you don't have to worry about "where should I install the package"
Usage is also a bit inconvenient, because working with relative paths implies you have to be careful with how you handle directories, as custom scripts and DB files are placed in the same directory as the source code of miniDB (see relevant issue (#1)). It also matters which path you run your Python scripts from!**

* minidb is taken, but mini-db is an available package name on PyPi! 😁

** This is because

from database import Database

tries to find database.py in your current path, so you essentially need to have all your scripts in the same directory, or import database will fail.

Repo folder tree restructure

Replace the current structure with one that is more professional and widely used + makes the repo easier to follow and understand.

This repo contains such info and can be useful

https://github.com/kriasoft/Folder-Structure-Conventions

Server-Client

Make a server for MiniDB. Maybe make a custom server or use an existing one.

Redundant is_locked(self, table_name) output

Are lines 428/429 in database.py necessary?

res = self.select('meta_locks', ['locked'], f'table_name=={table_name}', return_object=True).locked[0]
if res:
    print(f'Table "{table_name}" is currently locked.')
return res

Unless I'm missing something, the is_locked(self, table_name) function is supposed to return a boolean value depending on whether the input table is exclusively locked or not, but is it any meaningful to inform the user from within the function as well (either as print/logging.info), i.e. twice?

If not, I'll make sure my (future) PR will include this fix along with #38 & #47.

Btree refactoring - Task #2

Refactor the existing Btree implementation and:

Add Btree delete
Add support for multicolumn Btree indexes

SELECT DISTINCT - 10/50

Add support for distinct values in the SELECT clause by implementing Sort deduplication (if deduplication is needed).

Journals/Logs - Task #7

Implement a loggins/journaling system (WAL like) for each db/table. It should support:

rollback (for up to N states back - i.e. rollback N commands. You can either pick a specific N values or allow the user to pick)
command history (inserts, deletes etc)
printing verbose logs (as much info as possible. You pick what you think is important)

Hash indexes - Task #3

Implement hash index support as well as:

Hash indexes that support different hash functions/keys
Hash search
Hash Join
Hash Visualization tool

Add API for other languages.

Add an API for other languages like Java, JS, C#, ...

Py 2/3 compatibility

Although Python 2 is officially deprecated, many users still use it. Since miniDB (essentially database.py) does not appear to be heavily Py3-dependent, how about we make it Py2/3 compatible? Doesn't seem like a lot of work, but if it's been tested, what code segments are there that are incompatible with Py2 and require fixing?

Locks - Task #6

Implement the following types of locks:

Exclusive/Share table wide
Exclusive/Share row wide

MiniDB currently supports only one lock type, the exclusive table wide lock. Implement the rest of the types listed above and make sure that all the needed changes to the corresponding meta-tables are made.

Reimplementing performance critical functions - Task #13

Analyze the performance of various performance critical functions, pick the slowest/most used ones and reimplement them using a compiled language (C/C++/Rust). For example, implementing the file IO (saving files to something that is not binary) should be interesting.

Dashboard - SQL command parser

Design a simple and minimal dashboard that shows meta-tables and/or stats for each database and support the execution of SQL queries (REPL like).

File I/O - Not binary

Design a scheme for the database/table files that is closer to what traditional rdbms’ use. Test this new scheme and add to lib only if there is no major performance penalty.

Ordered Files/Tables - Task #1

Implement sorted files/tables as well as the following:

Add support for a separate insert stack/queue. Selects should work (i.e. should scan both the table and the stack).
Add a method that inserts all insert stack records to the original table when the stack gets too big.
Add binary search support

Distinguish source code from examples

It appears that the true package dependencies are database.py, table.py and btree.py, whereas the rest of the files are sample applications of miniDB. I would suggest moving the example scripts to a new directory (e.g. /examples) in order to distinguish them from the source code of miniDB.

Server-Client protocol

Design and implement a minimal server-client communication protocol. Since such an implementation can be very taxing with regards to performance, whether this will be added and to what extent will be discussed when the performance penalty added is evaluated.

Add quicklook functionality

Add a quicklook()/preview() functionality (matplotlib-based) to plot the size of each DB as a percentage (e.g. as a pie chart) to help the user navigate/get an idea of the distribution of data and information across their DB system in a quick and simple manner.

Add unit and functional tests.

Add unit tests. Add functional tests. Preferably add also full end-to-end tests.

CI/CD pipelines

Add CI/CD pipelines (preferably using Github actions) and add Jenkins for extra functionality. Also make passing unit tests mandatory to make pull request.

Cost-based optimization - Task #5

Think and implement an effective way of measuring the cost of the various IO functions (inserts, deletes etc).

Measuring run times is an option but it should be added a separate tool and not by altering all the existing methods. You should also create/use a tool that analyzes the runtime as extensively as possible.

Add project-wide config.

Add project-wide configuration.

Code directory tree.

Add a logical directory tree to the source code

Make miniDB dependency free

Removing the last 2 deps (numpy will be removed asap) would be VERY nice.

This means:

Replacing tabulate with a hand made table print function
Detecting whether graphviz is installed and either save the dot file (if no gv) or plot (if gv is installed). Gv whould still be recommended since it is very useful for ploting btrees, but miniDB would be fully functional without it.

Suppress prints

Replace prints with logging.info, logging.warning (or warnings.warn) and raise (or logging.exception) for information, warnings and errors respectively. The parentheses (i.e. "which one should I use?") go case-by-case, depending on the type of warning/error the user is presented with.

Add generic methods.

Add generic and reusable methods for frequently duplicated code blocks.

Add SQL interpreter.

Add an SQL interpreter to MiniDB.

Privileges - Task #11

Add support for different types of users with different privileges. Specifically:

Implement different types of users and groups (with different privileges)
Different types per DB and per Table

Hint: Create new meta tables that contain users, types etc. Authentication is a nice extra but is not necessary.

Task 3.4 - Reimplementing performance critical functions

A. Approach
Βασισμένη στην δημιουργία ενός dll υλοποιημένο σε c# και η κλήση αυτού γίνεται μέσω ενός αρχείου της Python. Οι συναρτήσεις που υλοποιήθηκαν παρουσιάζονται στο αρχείο της C# και βασίστηκαν στο αρχείο database της minidb. Ο κώδικας επισυνάπτεται στο φάκελο Α
.A.zip

B. Approach
Βασισμένη στην δημιουργία διαδικασιών στην Python για την μετατροπή ενός Binary αρχείου σε non binary μορφή καθώς και το διάβασμα αυτού του αρχείου. Το non binary αρχείο μπορεί να είναι ακόμα και ένα txt file. Παρουσιάζεται και η αντίστροφη πορεία καθώς και η κλήση των αρχείων της Python για το αρχείο database της minidb. Για την διαχείριση αυτού του προγράμματος υπάρχει το αρχείο main.py .

Ο κώδικας επισυνάπτεται στο φάκελο Β.
B.zip
@giorgostheo @0xCoto

Fix exception catching (prone to irrelevant error ignoring)

Exceptions in database.py are caught and passed, regardless of what the exception actually is. This is a problem because you may get a different exception in a different system under certain conditions, and the problem may be incorrectly ignored.

This issue appears in line 26, line 37 and line 566

A simple fix for this could be:

except Exception_name:
    print(...)

(also, if you have a command under the exception (e.g. print()), you don't need to append pass to break out of the exception and continue)

Await pull request...

Logo

Design a logo... that's it

Documentation - Read the Docs

Document the current codebase both in code and externally. Since Read the Docs is the standard for open source (especially python), use that. This needs to be ongoing since any and all new code needs to be documented. Self hosting is not a top priority, but it can happen.