datastories-unipi / minidb Goto Github PK
View Code? Open in Web Editor NEWAn extremely minimal DB that can be used for educational purposes and rapid prototyping
License: GNU General Public License v3.0
An extremely minimal DB that can be used for educational purposes and rapid prototyping
License: GNU General Public License v3.0
Analyse the current implementations of the available I/O commands and rewrite parts that are not optimised. Ease of understanding should not be sacrificed.
Implement a server-client architecture that uses ports in order for the client to send queries and receive results. Also, implement a very minimal sql compiler that just supports a select query (select * from is enough) in order to show the server-client functionality.
This can be implemented using Golang (easier for this kind of thing) or Python.
Implement an SQL compiler that supports the following queries:
Current savedir structure is:
current_dir/dbdata/
├── db1/
│ ├── table1.pkl
│ ├── table2.pkl
│ ├── ...
├── db2/
│ ├── table1.pkl
│ ├── table2.pkl
│ ├── ...
Changing this to a more standardized structure (moving it to /home/ for ex. and making it hidden), as well as including a setup script that creates the tree would be nice.
Add a usable SQL compiler that follows the compiler literature as much as possible. The SQL compiler needs to be easy to grasp and extend, since any newly added functionality will need to be accompanied by a new query.
Integrate MIniDB into the popular ORM tools (Object Relational Mapping tools).
Rn the the main src files (database, table) are in the same dir as others (btree and misc files). This should change to make the project more readable and lead - eventually - to PyPi.
I will read into this and maybe update the issue down the road.
Implement the following:
Additionally rethink the way the primary key is currently stored and reimplement it in a more concise and effective way.
Finally, add support for multicolumn primary key.
Add the needed github actions and CI/CD tools to make contributing to the project easier. Some examples include Unit Tests, a linter etc.
Add a GUI admin panel.
Implement the following:
As well as the following:
Add docker config (Dockerfile, Docker-compose) with a volume to store data in the host box , so it is runnable under Win10
Upload to PyPi when all/most of the story issues functionality is added.
Add a linter to the project to ensure code quality. Preferably use Flake8 or Pylama (or any other linter).
Implement the following types of Joins:
The "intended" way of installing miniDB is currently through git (cloning and running an example .py
file).
This poses a couple of problems, which are directly addressed if the project is pushed to PyPi:
pip
, you can simply run pip install -U mini-db
*)~/Desktop
) - it'd be a bit of a mess!). With pip
, it's as simple as pip install mini-db
and you don't have to worry about "where should I install the package"* minidb
is taken, but mini-db
is an available package name on PyPi! 😁
** This is because
from database import Database
tries to find database.py
in your current path, so you essentially need to have all your scripts in the same directory, or import database
will fail.
Replace the current structure with one that is more professional and widely used + makes the repo easier to follow and understand.
This repo contains such info and can be useful
Make a server for MiniDB. Maybe make a custom server or use an existing one.
Are lines 428/429 in database.py
necessary?
res = self.select('meta_locks', ['locked'], f'table_name=={table_name}', return_object=True).locked[0]
if res:
print(f'Table "{table_name}" is currently locked.')
return res
Unless I'm missing something, the is_locked(self, table_name)
function is supposed to return a boolean value depending on whether the input table is exclusively locked or not, but is it any meaningful to inform the user from within the function as well (either as print
/logging.info
), i.e. twice?
If not, I'll make sure my (future) PR will include this fix along with #38 & #47.
Refactor the existing Btree implementation and:
Add Btree delete
Add support for multicolumn Btree indexes
Add support for distinct values in the SELECT clause by implementing Sort deduplication (if deduplication is needed).
Implement a loggins/journaling system (WAL like) for each db/table. It should support:
Implement hash index support as well as:
Hash indexes that support different hash functions/keys
Hash search
Hash Join
Hash Visualization tool
Add an API for other languages like Java, JS, C#, ...
Although Python 2 is officially deprecated, many users still use it. Since miniDB (essentially database.py
) does not appear to be heavily Py3-dependent, how about we make it Py2/3 compatible? Doesn't seem like a lot of work, but if it's been tested, what code segments are there that are incompatible with Py2 and require fixing?
Implement the following types of locks:
MiniDB currently supports only one lock type, the exclusive table wide lock. Implement the rest of the types listed above and make sure that all the needed changes to the corresponding meta-tables are made.
Analyze the performance of various performance critical functions, pick the slowest/most used ones and reimplement them using a compiled language (C/C++/Rust). For example, implementing the file IO (saving files to something that is not binary) should be interesting.
Design a simple and minimal dashboard that shows meta-tables and/or stats for each database and support the execution of SQL queries (REPL like).
Design a scheme for the database/table files that is closer to what traditional rdbms’ use. Test this new scheme and add to lib only if there is no major performance penalty.
Implement sorted files/tables as well as the following:
Add support for a separate insert stack/queue. Selects should work (i.e. should scan both the table and the stack).
Add a method that inserts all insert stack records to the original table when the stack gets too big.
Add binary search support
It appears that the true package dependencies are database.py
, table.py
and btree.py
, whereas the rest of the files are sample applications of miniDB. I would suggest moving the example scripts to a new directory (e.g. /examples
) in order to distinguish them from the source code of miniDB.
Design and implement a minimal server-client communication protocol. Since such an implementation can be very taxing with regards to performance, whether this will be added and to what extent will be discussed when the performance penalty added is evaluated.
Add a quicklook()
/preview()
functionality (matplotlib
-based) to plot the size of each DB as a percentage (e.g. as a pie chart) to help the user navigate/get an idea of the distribution of data and information across their DB system in a quick and simple manner.
Add unit tests. Add functional tests. Preferably add also full end-to-end tests.
Add CI/CD pipelines (preferably using Github actions) and add Jenkins for extra functionality. Also make passing unit tests mandatory to make pull request.
Think and implement an effective way of measuring the cost of the various IO functions (inserts, deletes etc).
Measuring run times is an option but it should be added a separate tool and not by altering all the existing methods. You should also create/use a tool that analyzes the runtime as extensively as possible.
Add project-wide configuration.
Add a logical directory tree to the source code
Removing the last 2 deps (numpy will be removed asap) would be VERY nice.
This means:
Replace print
s with logging.info
, logging.warning
(or warnings.warn
) and raise
(or logging.exception
) for information, warnings and errors respectively. The parentheses (i.e. "which one should I use?") go case-by-case, depending on the type of warning/error the user is presented with.
Add generic and reusable methods for frequently duplicated code blocks.
Add an SQL interpreter to MiniDB.
Add support for different types of users with different privileges. Specifically:
Implement different types of users and groups (with different privileges)
Different types per DB and per Table
Hint: Create new meta tables that contain users, types etc. Authentication is a nice extra but is not necessary.
A. Approach
Βασισμένη στην δημιουργία ενός dll υλοποιημένο σε c# και η κλήση αυτού γίνεται μέσω ενός αρχείου της Python. Οι συναρτήσεις που υλοποιήθηκαν παρουσιάζονται στο αρχείο της C# και βασίστηκαν στο αρχείο database της minidb. Ο κώδικας επισυνάπτεται στο φάκελο Α
.A.zip
B. Approach
Βασισμένη στην δημιουργία διαδικασιών στην Python για την μετατροπή ενός Binary αρχείου σε non binary μορφή καθώς και το διάβασμα αυτού του αρχείου. Το non binary αρχείο μπορεί να είναι ακόμα και ένα txt file. Παρουσιάζεται και η αντίστροφη πορεία καθώς και η κλήση των αρχείων της Python για το αρχείο database της minidb. Για την διαχείριση αυτού του προγράμματος υπάρχει το αρχείο main.py .
Ο κώδικας επισυνάπτεται στο φάκελο Β.
B.zip
@giorgostheo @0xCoto
Exceptions in database.py
are caught and passed, regardless of what the exception actually is. This is a problem because you may get a different exception in a different system under certain conditions, and the problem may be incorrectly ignored.
This issue appears in line 26, line 37 and line 566
A simple fix for this could be:
except Exception_name:
print(...)
(also, if you have a command under the exception (e.g. print()
), you don't need to append pass
to break out of the exception and continue)
Await pull request...
Design a logo... that's it
Document the current codebase both in code and externally. Since Read the Docs is the standard for open source (especially python), use that. This needs to be ongoing since any and all new code needs to be documented. Self hosting is not a top priority, but it can happen.
Implement a performance evaluation scheme that offers qualitative results for commands like select, insert etc. Currently, “select” with btree returns the number of reads/comparisons. Use that as a frame of reference.
Implement journals that keep track of the previously executed commands and can rollback to a previous state. Also add support for more verbose logging.
Add functionality to miniDB for use as an in-memory only db for use in session management, etc
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.