Code Monkey home page Code Monkey logo

cs764-external-merge-sort's Introduction

Deliverables:

Your submission should be a zip file that includes:

Your full source code with clear documentations

A makefile that compiles all your executables

And README file that includes:

Each group member's names and student IDs

The techniques (see Grading) implemented by your submission and the corresponding source files and lines

A brief writeup of (1) the reasons you chose to implement the specific subset of techniques (2) the project's state (complete or have what kinds of bugs) (3) how to run your programs if they are different from the default specifications above. This is not expected to be a lengthy document, just to assist our understandings of your work.

Each group member's contributions to the project

Table of Contents

Group Members:

Project Group 15 on Canvas. Following are the names along with the ID numbers

  • Natan Lidukhover - 907 934 3324
  • Shivam Hire - 908 272 5129
  • Smit Shah - 908 587 6440
  • Tyler Wilson - 908 622 6520

Techniques Implemented:

  • Quicksort(Sort.cpp::quickSort()): We chose to implement quicksort for sorting smaller sized 500KB runs because its average time complexity is O(n * log(n)).
  • Tournament trees(tol.cpp::Class TOL): When using a tree of losers, all the records with which the winner has to be compared to lie on a path from the winner leaf to the root. It is very easy to find the next winner by calculating parent node indices and performing the necessary comp[arison].
  • Run size > memory size (Main.cpp::sortHddAndStore()): We do use run size greater than DRAM size for sorting large amounts of data to speed up sorting
  • Offset-value coding (tol.cpp::Class TOL): Helps in optimizing comparisons and improves time to run external merge sort
  • Minimum count of row & column comparisons (tol.cpp::Class TOL): Implemented in TOL (tree of losers) and offset-value coding to compare rows and find the first distinct column
  • Cache-size mini runs (Main.cpp::sortDramandStore): We are generating cache-sized mini runs and then quick sorting them
  • Device-optimized page sizes(Main.cpp::blockSize variable): Using device-optimized page sizes, we can use the entire bandwidth of the device rather than constraining it
  • Spilling memory to SSD(Main.cpp::sortSsdandStore()): When the memory is full in DRAM, we utilize the SSD to store sorted runs and then merge them using TOL (tree of losers) and output the sorted output to SSD
  • Spilling from SSD to disk(Main.cpp::sortHddAndStore()): When the memory is full in the SSD merge step, we utilize the HDD to store sorted runs and then merge them using TOL (tree of losers) and store it on disk
  • Graceful degradation into merging(Main.cpp::sortDramAndStore()): Final merge step in DRAM which ensures graceful degradation
  • Graceful degradation into merging beyond one step(Main.cpp::sortDramAndStore()): Final merge step in HDD which internally uses SSD and DRAM merges to gracefully degrade
  • Verifying sort order (verification.cpp::verify()): Helps to ensure that sort order is correct and that external merge sort is implemented
  • Verifying sets of rows and values(verification.cpp::verify()): Helps to ensure that the data is not corrupted and the input data matches with output data, and takes two individual binary files and (based off of designated row size and page size) verifies that all rows in the unsorted data are present in the sorted data, and that the sorted data is indeed sorted
  • Optimized merge patterns(Main.cpp::sortHddandStore): Implemented to make efficient use of all storage devices by pre-calculating the exact amount of DRAM needed plus the number of SSD and HDD runs that will be needed, and using DRAM->SSD->HDD as necessary. All the math calculating run sizes and buffer sizes implements the optimized merge patterns with three cases where either only DRAM, DRAM+SSD, or DRAM+SSD+HDD are needed.

Usage and discussion

Reasoning for techniques used

We used the offset-value coding and tree of losers technique in order to provide good efficiency and a minimal comparison count. The tree of losers uses an array-based structure and all node accesses are done by indexing and calculating indexes rather than traversing a linked list. We split up the main loop into three cases: DRAM sort only, DRAM + SSD sort, and DRAM + SSD + HDD sort in order to provide readable code while re-using code components for similar merging and sorting steps.

In the DRAM case, first the necessary output buffer size is set, and the remaining space (1-MB aligned) is used for the input buffer in order to maximize the amount that DRAM is used for quick access. Input is read in to DRAM from the specified input device. Cache-sized mini runs of 500KB are generated within DRAM and sorted using QuickSort due to its average case O(n * log(n)) time complexity and efficient O(log(n)) space complexity. These runs are then merged using tree of losers and output to the specified input device (case 1 is HDD).

In the SSD case, first the necessary output buffer size is set, and the remaining space (1-MB aligned) is pre-calculated for the input buffer to use in the next step. The number of 99MB (dramLimit - 1MB)-size runs is calculated, and the size of the last run (if it is less than the normal run size) is calculated. The above DRAM case is then run that many times but output to SSD instead of HDD. Then, the input buffer in DRAM that was previously pre-calculated is filled from the input device. Cache-sized mini runs of 500KB are generated within DRAM and sorted using QuickSort. These runs are then merged with the runs persisted on the SSD using tree of losers.

In the HDD case, first the necessary output buffer size is set, and the remaining space (1-MB aligned) is pre-calculated for the input buffer to use in the next step. The number of 9.9GB (ssdLimit)-size runs is calculated, and the size of the last run (if it is less than the normal run size) is calculated. The above SSD case is then run that many times but output to HDD. The above DRAM case is then run a maximal 100 times but output to SSD instead of HDD. Then, the input buffer in DRAM that was previously pre-calculated is filled from the input device. Cache-sized mini runs of 500KB are generated within DRAM and sorted using QuickSort. These runs are then merged with the runs persisted on the SSD and runs persisted on HDD using tree of losers. This all ensures that DRAM is used maximally before using SSD, then ensures that SSD is used maximally before using HDD.

Project state

The project is complete, including a bonus credit for implementing optimized merge patterns. There are no bugs that we have seen in the final product. The program does not take into account a specific number of columns (each row is considered one column). This is not really a limitation as the sorting process is the exact same, and the debug prints the same way, so this has no difference in the final output.

Running the program

The program can be run as follows (from the project base directory):

./Sort.exe -c [total number of records] -s [individual record size] -o [trace file]

To run the verification code rather than the sort process path, run it as follows. The sort process path must be run first in order to have data to verify at the necessary files (and the parameters must be identical):

./Sort.exe -c [total number of records] -s [individual record size] -o [trace file] -v

You can also optionally add a -d flag to run with debug mode turned on in order to get (first five bytes in each row) DRAM sorted run output to output/dram.txt, SSD sorted run output to output/ssd.txt (case 3), HDD sorted run output to output/hdd.txt (case 3), and final sorted output to output/testData.txt.

./Sort.exe -c [total number of records] -s [individual record size] -o [trace file] -d

Both -d and -v must be placed at the end of all the arguments, if present. Intermediate step sorted run output is available in the output folder in binary format as ssd.bin and hdd.bin.

Contributions

  • Natan Lidukhover: Setting up buffers for HDD, SSD, DRAM, and integrating TOL (tree of losers) in Main.cpp
  • Shivam Hire: Implementing TOL (tree of losers) data structure, Run class, and Storage cClass
  • Smit Shah: File I/O for random numbers, generating unsorted data, and SSD read/write data methods
  • Tyler Wilson: Random number generation and verification code for sorted and unsorted input

cs764-external-merge-sort's People

Contributors

smit-1999 avatar natanlidukhover avatar shivamhire123123 avatar tywilson12 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.