This implementation puts a twist on the Standard Treap (randomized search tree) in the sense that I add the idea of self-organizing data structures on top, To implement a Database (collection of rows that are somehow related) Indexing.
A self-organizing data structure is a data structure that has some rule(s) or algorithm attached to the access method for each node in the data structure.
These rule(s) or algorithms result in changing the structure or order of the data structure in an attempt to speed up the problem of searching for data in the data structure [AW05].
The access method is a function or function that searches for the data in the data structure and returns the data.
The main goal of a self-organizing data structure is to move away from a lookup time of O(n) and towards an instant lookup time of O(1). Examples of self-organizing data structures are self-organizing lists, binary trees and m-way trees.
Treap
A treap is a variation of a randomized search tree algorithm. Treaps differ from standard binary search trees in the following ways:
Node:
Nodes in a standard binary search tree only has a data member which stores the data of the node.
This differs from treaps in the sense that nodes has an additional member which is the priority of the node.
Properties:
Standard Binary Search trees only maintain the property that data that is smaller than the current node's data is part of the leftsubtree of the node and data that is greater than the current node's data are part of the rightsubtree of the node.
With treaps, an additional property is introduced. The max heap property. This property states that the data of node n is greater or equal to either of its children[ASSS86].
Treaps combine the idea of Heaps (where the aforementioned property originated from) and Binary SearchTrees. Thus Treaps maintain the max heap property of heaps as well as the efficient search property of Binary Search Trees.
The max heap property is applied to the priority of each treap node and the binary search is applied to the data of each treap node.
Treaps assignpriorities randomly to each node which originates from the randomized search tree algorithm. This allows for the average case of the treap to be perfectly balanced[SA96].
The standard implementation of a treap is to use an array but for this implementation I will be implementing the treap as a tree.
Self-organizing Treaps
This implementation will also put a twist on the standard treap in the sense that we add the idea of self-organizing data structures on top of the treap.
Thus with each access of a node, the priority of the node will increase and a rotation may be required to maintain the max heap property.
Database
A database is usually a collection of rows that are somehow related to each other.
It is possible to have unstructured databases (e.g. NoSQL) databases but for this implementation, I will be implementing a structured database.
Structured Database
A structured database refers to a database that has a strict structure that cannot be changed or will result in possibly undefined behavior.
A database's structure is described by a set of properties the data has which will be called columns. Rows in the database are a grouping of data that has all the properties and that are closely related to each other.
Thus each row's properties are somehow closely related to each other. Usually, databases have multiple tables to model the real world more accurately.
Due to complexity, this implementation will only be implementing a single-tabled database. Databases usually have the following set of operations
Insert
Update
Delete
Search
And I will also implement these operations.
Indexing
Searching is one of the most important features of a database. A naive approach is to linearly search through the database for a record or possibly multiple records.
But this implementation will try to optimize this naive approach.
In databases especially relational databases is to index columns that are used to search through often.
Indexing implies building an externaldata structure that will increase the efficiency of the searching algorithm when searching for data. Usually, this is accomplished by using a B+ or B* tree.
This implementation will attempt to use a different data structure: Self Organizing Treaps. This will hopefully increase the efficiency of the searching algorithm over the linear searching algorithm
[ASSS86] Michael D Atkinson, J-R Sack, Nicola Santoro, and Thomas Strothotte. Min-max heaps and generalized priority queues. Communications of the ACM, 29(10):996–1000, 1986.
[AW05] Susanne Albers and Jeffery Westbrook. Self-organizing data structures. Online Algorithms: The state of the art, pages 13–51, 2005.
[SA96] Raimund Seidel and Cecilia R Aragon. Randomized search trees. Algorithmica, 16(4-5):464–497, 1996
UML DIAGRAM:
EXAMPLE SQL QUERIES AND EQUIVALENT JAVA COODE:
Example Table Structure
Example 1
Java
Database(["ID","Computer Scientist Name","Research Area","Year of First Publication"], 15)
SQL Query
CREATEDATABASEIF NOT EXISTS computer_scientists;
USE computer_scientists;
CREATETABLEIF NOT EXISTS computer_scientists (
id INTNOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
research_area VARCHAR(255) NOT NULL,
year_first_publication INTNOT NULL
);