Code Monkey home page Code Monkey logo

building-a-database's Introduction

About

This repository contains an implementation of a very simple database engine, written in Go. (In less technical terms, it's not just having a database of specific information. The database engine is what the data goes into and stores it - kind of like building a simple version of Excel, but with more versatility and fewer handy buttons.)

This is my senior capstone project as a Computer Science major at Bryn Mawr College. I'm building this database in order to understand how the inside of a database work, from reading and writing to disk all the way to parsing SQL queries and optimizing how to get the answer most efficiently. I do a lot of work with data analysis and visualization, so I wanted to get to know the underlying architecture that I rely on so often. It may also be, I hope, a useful resource for some burgeoning database engineer out there, to modify and play with and pick apart much more easily than the databases that we generally use.

If you know anyone who would be interested in hiring a newly-graduated CS student who does data or visual or coding stuff, I would love it if you sent them my resume!

Status

These are the current features of the database engine. The numbers refer to the corresponding chapter in the textbook I'm using as a basis (Database Design and Implementation, by Edward Sciore). The same numbers are also prefixed to names of files containing the code for the corresponding feature. This means that the order of the files is in rising complexity, and later files generally build on the objects of earlier ones.

  • [3] Files are divided up into blocks of a size matching what the CPU uses when writing/reading from the disk. These blocks can read and write integers, strings, and generic byte objects (blobs). Blocks are accessed independently of each other, so reading or writing one block does not require reading or writing the rest of the file.
  • [3,4] To minimize disk reads and writes, currently-in-use blocks are stored in pages held in a buffer pool, which holds pages in use but also, if there is any room left over, keeps recently-used blocks around (i.e. caches them) and does not write to the disk until required.
  • [4,5] The database has a log manager, which records each change to the database as they occur, so that returning to a previous state is possible (whether intentionally by reverting a change, or for restoring if the database crashes). These changes are grouped into distinct transactions, one from each concurrent user, so that users do not interfere with each other and so that changes are not officially made until the user sends a signal to commit. (If a user crashed unexpectedly, having a partially-done set of changes could be very bad!)
  • [5] Note that proper concurrency safety is not implemented at the moment, and is one of the reasons that commonly used databases are so much more confusing to take apart.
  • [6] Database records are stored in record pages, which can be retrieved from inside blocks. The way they are stored and accessed is determined by the schema and layout of each table. This is essentially what the fields (columns) of the table are called and what type they store (e.g. integer, string, blob). Accessing or modifying the record pages is done with a table scan.
  • [7] The database keeps track of the tables via a table manager, which keeps lists of all tables and their fields (including field types).
  • [7] The database keeps track of basic statistics about the tables, in order to significantly optimize queries.
  • [8] The database has functions it can use to perform relational algebra on its data (e.g. it can filter a table based on a given criteria and return the shortened table).
  • [9] The database can parse (basic) SQL queries input as strings into the relational algebra components it is familiar with, and return accurate results.

Future work:

  • [5] The database is safe for concurrent users (i.e. addresses problems like two users trying to modify the same bit of data), and for recovery of the database should it be needed.
  • [11,ish] The database fits with the Go sql_driver interface standard and can be used as a package like a standard Go module.
  • [11,ish] The database is stored on a server and accessed from there, rather than stored directly on the client machine.
  • [12] The database uses indexes and B-trees to store and access data much more efficiently.

Resources

By far the most vital resource for me has been Database Design and Implementation, by Edward Sciore.

building-a-database's People

Contributors

isabelle-sanford avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.