Code Monkey home page Code Monkey logo

minidb's People

Contributors

0xcoto avatar bl-ever avatar giorgostheo avatar yannistheodoridis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

minidb's Issues

Server - Client - Task #9

Implement a server-client architecture that uses ports in order for the client to send queries and receive results. Also, implement a very minimal sql compiler that just supports a select query (select * from is enough) in order to show the server-client functionality.

This can be implemented using Golang (easier for this kind of thing) or Python.

Sql compiler - Task #12

Implement an SQL compiler that supports the following queries:

  • Select From Where
  • Update Where
  • Insert Into
  • Delete From Where
  • Create/Drop Table
  • Create Database
  • Select From Join
  • Create Index On

Change savedir structure and adding setup

Current savedir structure is:

current_dir/dbdata/
├── db1/
│ ├── table1.pkl
│ ├── table2.pkl
│ ├── ...
├── db2/
│ ├── table1.pkl
│ ├── table2.pkl
│ ├── ...

Changing this to a more standardized structure (moving it to /home/ for ex. and making it hidden), as well as including a setup script that creates the tree would be nice.

SQL Compiler

Add a usable SQL compiler that follows the compiler literature as much as possible. The SQL compiler needs to be easy to grasp and extend, since any newly added functionality will need to be accompanied by a new query.

ORM Integration.

Integrate MIniDB into the popular ORM tools (Object Relational Mapping tools).

Changing the project structure

Rn the the main src files (database, table) are in the same dir as others (btree and misc files). This should change to make the project more readable and lead - eventually - to PyPi.

I will read into this and maybe update the issue down the road.

Extra IO and Primary Key rethink - Task #10

Implement the following:

  • Group by
  • Select Distinct

Additionally rethink the way the primary key is currently stored and reimplement it in a more concise and effective way.

Finally, add support for multicolumn primary key.

CI/CD stuff - linter, codeconv etc.

Add the needed github actions and CI/CD tools to make contributing to the project easier. Some examples include Unit Tests, a linter etc.

Add Docker Configuration

Add docker config (Dockerfile, Docker-compose) with a volume to store data in the host box , so it is runnable under Win10

PyPi upload

Upload to PyPi when all/most of the story issues functionality is added.

Add a linter

Add a linter to the project to ensure code quality. Preferably use Flake8 or Pylama (or any other linter).

Joins - Task #4

Implement the following types of Joins:

  • Outer Joins
  • Index-Nested-Loop-Join
  • Sort-Merge-Join

Push project to PyPi

The "intended" way of installing miniDB is currently through git (cloning and running an example .py file).

This poses a couple of problems, which are directly addressed if the project is pushed to PyPi:

  • In order to update to a newer version of miniDB, you'd have to re-clone the repository. With pip, you can simply run pip install -U mini-db*)
  • Installation is inconvenient (imagine having to git clone every Py package to a local directory (e.g. ~/Desktop) - it'd be a bit of a mess!). With pip, it's as simple as pip install mini-db and you don't have to worry about "where should I install the package"
  • Usage is also a bit inconvenient, because working with relative paths implies you have to be careful with how you handle directories, as custom scripts and DB files are placed in the same directory as the source code of miniDB (see relevant issue (#1)). It also matters which path you run your Python scripts from!**

* minidb is taken, but mini-db is an available package name on PyPi! 😁

** This is because

from database import Database

tries to find database.py in your current path, so you essentially need to have all your scripts in the same directory, or import database will fail.

Server-Client

Make a server for MiniDB. Maybe make a custom server or use an existing one.

Redundant is_locked(self, table_name) output

Are lines 428/429 in database.py necessary?

res = self.select('meta_locks', ['locked'], f'table_name=={table_name}', return_object=True).locked[0]
if res:
    print(f'Table "{table_name}" is currently locked.')
return res

Unless I'm missing something, the is_locked(self, table_name) function is supposed to return a boolean value depending on whether the input table is exclusively locked or not, but is it any meaningful to inform the user from within the function as well (either as print/logging.info), i.e. twice?

If not, I'll make sure my (future) PR will include this fix along with #38 & #47.

Btree refactoring - Task #2

Refactor the existing Btree implementation and:

  • Add Btree delete

  • Add support for multicolumn Btree indexes

SELECT DISTINCT - 10/50

Add support for distinct values in the SELECT clause by implementing Sort deduplication (if deduplication is needed).

Journals/Logs - Task #7

Implement a loggins/journaling system (WAL like) for each db/table. It should support:

  • rollback (for up to N states back - i.e. rollback N commands. You can either pick a specific N values or allow the user to pick)
  • command history (inserts, deletes etc)
  • printing verbose logs (as much info as possible. You pick what you think is important)

Hash indexes - Task #3

Implement hash index support as well as:

  • Hash indexes that support different hash functions/keys

  • Hash search

  • Hash Join

  • Hash Visualization tool

Py 2/3 compatibility

Although Python 2 is officially deprecated, many users still use it. Since miniDB (essentially database.py) does not appear to be heavily Py3-dependent, how about we make it Py2/3 compatible? Doesn't seem like a lot of work, but if it's been tested, what code segments are there that are incompatible with Py2 and require fixing?

Locks - Task #6

Implement the following types of locks:

  • Exclusive/Share table wide
  • Exclusive/Share row wide

MiniDB currently supports only one lock type, the exclusive table wide lock. Implement the rest of the types listed above and make sure that all the needed changes to the corresponding meta-tables are made.

Reimplementing performance critical functions - Task #13

Analyze the performance of various performance critical functions, pick the slowest/most used ones and reimplement them using a compiled language (C/C++/Rust). For example, implementing the file IO (saving files to something that is not binary) should be interesting.

Dashboard - SQL command parser

Design a simple and minimal dashboard that shows meta-tables and/or stats for each database and support the execution of SQL queries (REPL like).

File I/O - Not binary

Design a scheme for the database/table files that is closer to what traditional rdbms’ use. Test this new scheme and add to lib only if there is no major performance penalty.

Ordered Files/Tables - Task #1

Implement sorted files/tables as well as the following:

  1. Add support for a separate insert stack/queue. Selects should work (i.e. should scan both the table and the stack).

  2. Add a method that inserts all insert stack records to the original table when the stack gets too big.

  3. Add binary search support

Distinguish source code from examples

It appears that the true package dependencies are database.py, table.py and btree.py, whereas the rest of the files are sample applications of miniDB. I would suggest moving the example scripts to a new directory (e.g. /examples) in order to distinguish them from the source code of miniDB.

Server-Client protocol

Design and implement a minimal server-client communication protocol. Since such an implementation can be very taxing with regards to performance, whether this will be added and to what extent will be discussed when the performance penalty added is evaluated.

Add quicklook functionality

Add a quicklook()/preview() functionality (matplotlib-based) to plot the size of each DB as a percentage (e.g. as a pie chart) to help the user navigate/get an idea of the distribution of data and information across their DB system in a quick and simple manner.

CI/CD pipelines

Add CI/CD pipelines (preferably using Github actions) and add Jenkins for extra functionality. Also make passing unit tests mandatory to make pull request.

Cost-based optimization - Task #5

Think and implement an effective way of measuring the cost of the various IO functions (inserts, deletes etc).

Measuring run times is an option but it should be added a separate tool and not by altering all the existing methods. You should also create/use a tool that analyzes the runtime as extensively as possible.

Make miniDB dependency free

Removing the last 2 deps (numpy will be removed asap) would be VERY nice.

This means:

  • Replacing tabulate with a hand made table print function
  • Detecting whether graphviz is installed and either save the dot file (if no gv) or plot (if gv is installed). Gv whould still be recommended since it is very useful for ploting btrees, but miniDB would be fully functional without it.

Suppress prints

Replace prints with logging.info, logging.warning (or warnings.warn) and raise (or logging.exception) for information, warnings and errors respectively. The parentheses (i.e. "which one should I use?") go case-by-case, depending on the type of warning/error the user is presented with.

Privileges - Task #11

Add support for different types of users with different privileges. Specifically:

  • Implement different types of users and groups (with different privileges)

  • Different types per DB and per Table

Hint: Create new meta tables that contain users, types etc. Authentication is a nice extra but is not necessary.

Task 3.4 - Reimplementing performance critical functions

A. Approach
Βασισμένη στην δημιουργία ενός dll υλοποιημένο σε c# και η κλήση αυτού γίνεται μέσω ενός αρχείου της Python. Οι συναρτήσεις που υλοποιήθηκαν παρουσιάζονται στο αρχείο της C# και βασίστηκαν στο αρχείο database της minidb. Ο κώδικας επισυνάπτεται στο φάκελο Α
.A.zip

B. Approach
Βασισμένη στην δημιουργία διαδικασιών στην Python για την μετατροπή ενός Binary αρχείου σε non binary μορφή καθώς και το διάβασμα αυτού του αρχείου. Το non binary αρχείο μπορεί να είναι ακόμα και ένα txt file. Παρουσιάζεται και η αντίστροφη πορεία καθώς και η κλήση των αρχείων της Python για το αρχείο database της minidb. Για την διαχείριση αυτού του προγράμματος υπάρχει το αρχείο main.py .

Ο κώδικας επισυνάπτεται στο φάκελο Β.
B.zip
@giorgostheo @0xCoto

Fix exception catching (prone to irrelevant error ignoring)

Exceptions in database.py are caught and passed, regardless of what the exception actually is. This is a problem because you may get a different exception in a different system under certain conditions, and the problem may be incorrectly ignored.

This issue appears in line 26, line 37 and line 566

A simple fix for this could be:

except Exception_name:
    print(...)

(also, if you have a command under the exception (e.g. print()), you don't need to append pass to break out of the exception and continue)

Await pull request...

Logo

Design a logo... that's it

Documentation - Read the Docs

Document the current codebase both in code and externally. Since Read the Docs is the standard for open source (especially python), use that. This needs to be ongoing since any and all new code needs to be documented. Self hosting is not a top priority, but it can happen.

Performance evaluation method implementation

Implement a performance evaluation scheme that offers qualitative results for commands like select, insert etc. Currently, “select” with btree returns the number of reads/comparisons. Use that as a frame of reference.

Journals - Logs

Implement journals that keep track of the previously executed commands and can rollback to a previous state. Also add support for more verbose logging.

In-Memory miniDB

Add functionality to miniDB for use as an in-memory only db for use in session management, etc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.